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ON A UNIQUE FEATURE OF STATISTICS* 


Grorce W. SnepEcor 
Professor of Statistics, Iowa State College 


N UNDERTAKING the presidency of the American Statistical Associa- 

tion, my chief purpose was to do my part toward raising the stand- 
ards of our profession. I soon found it necessary to clarify my ideas about 
the nature of those standards I hoped to raise. Statisticsis a sprawling 
subject covering loosely the collection of observational data, the sum- 
marization of these data, the drawing of conclusions based upon them, 
and pertinent mathematical theory. These processes and theories are 
the common property of many disciplines. Is there any unique feature 
that distinguishes the professional statistician from his fellows? If so, 
it should be the foundation on which standards are set up. 

It is immediately clear that if there is anything that characterizes 
the professional statistician this thing changes in time. The earliest 
preoccupation of statisticians was with military and economic affairs of 
the state—human and material resources for making war, and the 
spoils of a successful campaign. Much later came a long period in 
which statisticians, in the words of R. A. Fisher, “appear to have had 
no other aim than to ascertain aggregate or average values.” During 
this period, the theory of probability was extensively developed but its 
impact on statistical thinking was somewhat superficial. The present 
era of statistics is characterized by the emphasis on variation, notably 
sampling variation. Variation is interesting not only in itself as a well- 
nigh universal phenomenon, but more especially as one source of the 
uncertainty in all inductive reasoning. It seems that in our own time, 
the professional statistician’s peculiar function is to develop and publi- 
cize the implications of variation. The future I do not pretend to know; 
but the consequences of variation, in the broad sense of uncertain in- 
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* Presidential address delivered at the 108th Annual Meeting of the American Statistical Associa- 
tion on December 28, 1948. 
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ference do not seem to have been entirely worked out. It is reasonable 
to believe, then, that both at present and in the foreseeable future, the 
professional statistician’s most useful contribution to science is in the 
theory and practice of uncertain inference. Dr. Flood has put it rather 
strikingly in this fashion: “The professional statistician reduces data to 
numerical] form and uses them to measure the fallibility of a conclusion 
where fallibility is estimated exclusively from the data in hand.” 

The last clause of this definition should be emphasized in advance of 
further discussion. All scientists make judgments about fallibility. 
They scrutinize their data with care; conclusions are checked against 
the theories prevailing in their fields; if there is a reasonable doubt 
about the conclusion, additional data are collected. It is only after the 
investigator has satisfied himself, with some high degree of assurance, 
that his conclusions are valid that he releases them for the critical 
observation of his colleagues. The distinction between the professional 
statistician and his fellow scientists is that the statistician evaluates the 
uncertainty of the conclusion by use of the data themselves, the evalua- 
tion being in the form of an exact statement of probability. 

Two other facts should be observed. The first is that scientists are 
plagued with many variables beside those that can be reduced to meas- 
urement by the laws of chance. Inaccuracies of various kinds may 
creep in; it is only lack of precision, ordinarily numeralized in experi- 
mental or sampling error, that can (through appropriate conduct of the 
investigation) be measured. The inaccuracies may invalidate the con- 
clusion despite the fact that the statistical measure of fallibility indi- 
cates a high degree of confidence in it. From this, one may decide that 
the contribution of statistics, in the restrictive sense in which I am 
using the term, is of minor or even negligible utility. In what follows, 
I hope to show that such is not the case, at least so far as my experience 
and observation are evidential. 

The second fact to be observed is that the professional statistician 
and the investigator in economics, biology, engineering, etc. are usually 
the same person. It is merely for convenience that I mention them 
separately. Researchers in many fields have seized upon the sta- 
tistical devices for measuring uncertainty, so that I include them in 
my definition of professional statistician. My thesis is that the char- 
acteristic which distinguishes the present-day professional statis- 
tician is his interest and skill in the measurement of the fallibility of 
conclusions. ' 

The layman, I am sure, would be surprised by such a statement. He 
is accustomed to the trappings of statistics rather than to the essence 
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of it. To him statistics is symbolized by long rows of tedious figures 
and their display in tables and charts. Even job specifications for stat- 
isticians are commonly limited to the arithmetical processes of calcu- 
lating averages, correlation coefficients, trends and probable errors. 
These are all useful procedures but they can be carried on as well by a 
clerk as by the professional—often, indeed, better. The professional 
statistician, whatever his other necessary qualifications, would seem 
to be set off from the layman by his habitual awareness of the fallibility 
of conclusions based upon data. It is not my purpose to advance further 
arguments for the thesis that statistics has this unique function, but 
merely to assume that it has and to discuss some of the consequences. 

First, let us look at the field of experimental science. In the familiar 
sequence of the scientific method—hypothesis, experiment, conclusion 
—the part which is peculiarly statistical is the conclusion. This involves 
a judgment about the fallibility of that “ . . . logically hazardous proc- 
ess—the process of generalizing from particular results.” I am quoting 
from Mood’s introduction of his “Theory of Statistics.” “The broad 
problem of statistical inference is to provide measures of the uncer- 
tainty of conclusions drawn from experimental data.” One highly de- 
veloped measure of uncertainty is the statistical test of hypothesis; 
this exemplifies the statistical part of the scientific method. 

Were the professional statistician to take no further interest in the 
procedures of the scientific method he would be fulfilling his essential 
share in them by evaluating fallibility, but he would fall far short of 
realizing his full usefulness to his fellow scientists. It is not until late 
in the sequence that the statistical part of the method finds its place. 
At this stage it is often discovered that faults in the design of the ex- 
periment make the measurement of fallibility either unnecessarily diffi- 
cult or wholly impossible; it is not unusual to find too late that the 
quantity of data furnished by the experiment is inadequate to detect the 
effects in question; and it may be evident that the experiment by 
change of design could have been made more sensitive, with consequent 
saving of effort and money. This means that the professional statis- 
tician may not only facilitate his own job at the end but may increase 
the efficiency of the experiment by modifying its design and may, in- 
deed, rescue it from failure by estimating its required size; all this by 
anticipating his own peculiar part in the conclusion. To me it is aston- 
ishing as well as gratifying that the professional statistician, in order to 
perform effectively his small but indispensable part in the scientific 
method, has been impelled to inspect the whole structure and has 
brought about substantial strengthening in many of its members. 
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Turning next to the field of science in which the survey instead of 
the experiment is the device used, we find statistics occupying some- 
what the same unique position. The objective of the survey may be 
either to get information about some hypothesis or to estimate one or 
more parameters of the population. A survey is planned and executed 
in order to get the necessary evidence. The conclusion, in so far as it 
is based on the data, is inductive in nature and is subject to uncer- 
tainty. It may be looked upon as the professional statistician’s particu- 
lar business to evaluate this uncertainty. 

As in experimental science, the professional statistician can enhance 
his usefulness by helping with the design of the survey. He can recom- 
mend designs that will furnish appropriate estimates of both position 
and scale; he can help choose the design that will be as efficient as is 
profitable; and he can specify the size of the sample that, with a desig- 
nated probability, may be expected to yield a satisfactorily small 
measure of the fallibility of the conclusion regardlens of what this con- 
clusion may turn out to be. 

But in this branch of science, the professional statistician is called 
upon to make more extensive contributions than those required in ex- 
perimentation. Not only must he concern himself with precision but 
more especially with conditions affecting accuracy. So far as I can 
judge, the majority of the surveys now in operation have sources of 
inaccuracy not amenable to measurement by means of the data ob- 
tained. Restriction of the sampling to regions non-randomly chosen 
and the purposive selection of respondents are inherent causes of in- 
accuracy in the usual type of quota sampling. These causes easily 
could be remedied. But the professional statistician cannot stop when 
the relatively simple procedures of sampling are improved. He will then 
have to join other scientists in their attack on the really tough prob- 
lems of schedule construction, selection and training of interviewers, 
and the little understood relations between interviewer and respondent. 
Although these problems are psychologic, social and economic, they 
affect the probabilities which control the measurement of uncertainty 
and therefore fall within the purview of the professional statistician’s 
interest. 

Why do professional statisticians in surveys have graver responsibili- 
ties than those in the experimental sciences? One reason is that the 
experiment is an older and better developed instrument than the survey, 
requiring less extensive and less obvious improvements. Another is 
that in the main experimenters are better trained for their work than 
are surveyors, and are heirs to a tradition of severe self-discipline. 
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Operators of surveys are only beginning to feel the need for examining 
their procedures: the embarrassment of the pollers in the recent election 
will, I hope, emphasize the necessity for higher standards among all 
samplers. If not, increasing loss of confidence by the public is sure to 
ensue. A third reason for the heavier responsibilities of professional 
statisticians in surveys is that controls are more difficult for investi- 
gators who work with human material—hkomo sapiens is notoriously 
a difficult experimental animal. It may be years or even decades before 
the professional statistician’s part in the survey becomes so specialized 
as it now is in the experiment. 

Reverting next to my opening theme, it seems obvious to me that in 
assessing professional standing in statistics, expertness in evaluating 
the fallibility of conclusions should play a major role. In saying this, I 
am not ignoring the fact that most users of statistics will have little 
interest in qualifying as specialists in so narrow a branch of the subject. 
Statisticians (who may or may not rate as professionals) have astonish- 
ingly varied activities. The collection of data, the planning that pre- 
cedes this collection, the summarizing processes that follow, the inter- 
pretation and reporting of the results— these are preoccupations of 
thousands of us. Other thousands, doubtless the majority, have only an 
incidental interest in professional statistics, their primary objectives 
being in the field of application—economics, industry, medicine, and 
dozens more: these usually are included in the fold of statistics because, 
in their own subject matter fields, they base their investigations on 
observational data. But all who use statistics have this in common: 
they are working toward conclusions based on more or less incomplete 
enumeration, conclusions that have the uncertainty of all induction. 
So, they are all concerned with the evaluation of uncertainty whether 
or not they specialize in this unique feature. I believe that every statis- 
tician will be more valuable in his own area if he clearly apprehends this 
universal characteristic of his material and that his professional com- 
petence will increase with his expertness in evaluating the fallibility of 
conclusions based upon such material. 

It is plain that I make a clear distinction between professional statis- 
ticians and statisticians in fields using statistics as a tool, statisticians 
who may have no proficiency in measuring uncertainty. These latter 
must of necessity make judgments about risk, but they often do this 
successfully without actual evaluation. They may be top-flight scien- 
tists or administrators but may never subject their probabilities ex- 
plicitly to measurement. It is the measurement of the uncertainties of 
conclusions that distinguishes the professional statistician and which 
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makes him useful to other professionals in the various fields of applica- 
tion. 

Professional statisticians may or may not be mathematicians. The 
more mathematics the better, but it is not essential. Of course, the 
mathematical statistician must develop the techniques of measurement 
and must carefully describe the conditions of their applicability. If 
unusual conditions are met, he must be called upon to devise appropri- 
ate new techniques. The non-mathematical professional statistician 
must gain experience in the subject-matter fields. He cannot assuredly 
evaluate the uncertainty of conclusions unless he is intimately ac- 
quainted with the uncertainties in the data which he uses for his meas- 
urements. 

To some, on first thought, it may appear that I am suggesting un- 
necessarily rigorous standards. After more careful consideration they 
will agree, I think that this is not so. There is nothing essentially diffi- 
cult in the idea of variation and its consequences; I have found that 
students in a first course in statistics easily grasp the concepts. The idea 
is certainly not new though it has received increasing emphasis during 
the last thirty or forty years. Actually I am making the modest sugges- 
tion that professional attainment in statistics be gauged by attitudes 
toward statistical thinking of the present rather than of the past. It 
seems to me that up-to-dateness is a minimum standard for profes- 
sionalism in any field. In my thinking, standards in professional sta- 
tistics must be based, at least in part, on modern developments in the 
subject; they must include not only proficiency but preoccupation in 
the measurement of uncertainty. 

Let us now consider the application of my thesis to education. Funda- 
mentally I am a teacher. I think my chief contribution to statistics is 
the training of hundreds of budding scientists in the straight and nar- 
row way of uncertain inference. Until recently my field has been a nar- 
row one, limited almost entirely to the statistics of biological experi- 
mentation. Only during the last year have I begun the development of 
a course in elementary statistics with broad cultural objectives. This 
field seemingly is without limits. The unique feature of statistics, the 
evaluation of risk, is part of the daily and hourly living of every one of 
us. Uncertainty envelops us, and success or failure in life is the summa- 
tion of myriads of decisions as to which is the least hazardous course. 
It would seem that one or more courses in statistics would be part of 
every student’s training: but, as Dr. Walker said in her presidential 
address of 1944, “I have never heard of a liberal arts college that under- 
took to explain to its students the stochastic nature of the universe in 
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which they live and move and have their being.” Why is this? I fear it is 
because we, as teachers, have failed to make the subject vital and con- 
vincing. I am afraid we have emphasized the calculational and graph- 
ical devices rather than the essential nature of the subject. Instead of 
devoting our energies to bringing the student into harmony with his 
physical, biological and economic environment of variation and un- 
certainty, we have bored him with another course in arithmetic and 
algebra from which meaning is largely omitted. The nature of decisions 
based on probability, experience in sampling together with the con- 
comitant risk in drawing conclusions, the fundamentals of our great 
cooperation in insurance, the social implications of betting—these are a 
few of the numerous facts of life that should form the structure of our 
courses in statistics. I believe that if, during the past fifty years, a 
realistic, living statistics had been taught, the subject would now be 
considered indispensable by most of our college administrators. 

I think it an auspicious sign that a section on business statistics is 
being considered during this annual meeting. Business and statistics are 
blood brothers in that risk is basic in both. Yet most of our instruction 
in business statistics either ignores this common heritage or touches 
upon it vaguely in a chapter on sampling tucked away in the latter 
part of the book. About the only risk the student seems to be made con- 
scious of is the risk of a mistake in arithmetic. Is it too much to hope 
that some of our forward-looking business executives take the initiative 
in advocating the elimination of unrealistic courses in statistics from 
our curricula and the substitution of functional courses in their stead? 
I judge this could be done in half a dozen years by an energetic organ- 
ization with adequate resources. After all, business is a consumer of 
the college product and is in a commanding position to insist on quality 
control of the output. 

The lack of professional standards for teachers of statistics in our 
colleges and universities is an astonishing feature of our times. In 
fields other than statistics, even an instructor is often required to have 
the doctor’s degree in the subject matter of the department; the 
bachelor’s degree in an almost universal minimum requirement. Yet 
how many teachers of statistics have been graduated from a curriculum 
in statistics? It apparently never occurs to the head of a department of 
education, for example, to ask a prospective teacher of statistics about 
his degrees in statistics. Most of us are graduates of such departments 
as economics, mathematics, business or psychology. Our academic 
training in statistics may have been no more extensive than the courses 
we now teach. Personally, we are not to blame for this because in our 
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generations there were no curricula in statistics. Even at present they 
are distressingly few. But we as teachers should be aggressively dissatis- 
fied with such a condition. We should work for the establishment of 
departments of statistics and should each strive to improve his own 
professional standing. We should resolve that the next generation of 
teachers shall have advantages not available to us. Our Section on the 
Training of Statisticians has the glorious opportunity of leading in this 
high endeavor. 

What are the implications of my thesis for the American Statistical 
Association? Under our new constitution we have abandoned our pre- 
occupation with any one subject-matter field and have volunteered our 
services as a focus of statistics among them all. In this capacity, we 
were asked to participate in the work of the Hoover Commission, sug- 
gesting desirable reorganization in the statistical agencies of the gov- 
ernment. We are joining the Social Science Research Council in review- 
ing the recent election predictions of three of the most prominent poll- 
ing organizations. Two government bureaus have asked criticism and 
suggestions for their programs. In meeting such responsibilities, our 
commission on Statistical Standards and Organizations seems destined 
to wield a powerful influence in statistical affairs. These things can be 
done only because we have in our membership professional statisticians 
of the highest competence. To maintain and enlarge this leadership, we 
must attract to our ranks other able statisticians from all subject- 
matter fields, professional statisticians whose skills include that unique 
function of modern statistics, the measurement of the uncertainty of 
inductive conclusions. 
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AN ATTEMPT TO GET THE “NOT AT HOMES” 
INTO THE SAMPLE WITHOUT CALLBACKS 


ALFRED Po.itz 
AND 
WILLARD SIMMONS 


PART I 


This paper describes a plan for eliminating the need for call- 
backs. Each person in the sample is visited only once. From 
each person interviewed information is obtained as to whether 
or not he was at home on specific instances, including the 
instance of the interview, which permits an estimate of the 
proportion of time he is at home during the interviewing hours. 
Questionnaires are divided into e.g. 6 groups according to the 
estimated proportion of time persons in each group are at 
home, viz., 1/6, 2/6, +--+, 6/6 of the time. The sample esti- 
mate, for any variable under study, is produced by weighting 
the results for each group by the reciprocal of the estimated 
per cent that persons are at home. It is shown that under cer- 
tain conditions this estimate is unbiased and the variance of 
the estimate is obtained. A numerical comparison is made be- 
tween this plan and the usual method of calling back, 


ANY INDIVIDUALS are not available for an investigation on the 

first visit because they are not at home when the interviewer calls 
on them. These cases are often referred to as the “not at homes.” De- 
pending on the time when interviews are feasible and on the kind of 
individuals under question, the percentage of “not at homes” usually 
varies between 30 and 60. The “not at homes” thereby constitute a 
factor of extreme importance. The simplest theoretical device for the 
completion of the sample consists of revisiting, again and again, the 
homes where a certain individual was not found on the first call, until 
the particular individual is found. These callbacks are spread thinner 
and burdened with longer travel time than first-call interviews. The 
second visit is more expensive than the first visit, the third visit is more 
expensive than the second one. The economic burden increases with 
subsequent calls to the point that a certain percentage of attempted 
interviews usually is considered unobtainable. The increased costs per 
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information unit derived from callbacks make it, in most cases, advis- 
able not to attempt revisiting all the “not at homes” of the primary sam- 
ple, but to revisit only a sub-sample of it. While sub-sampling increases 
the sampling error, it still need not introduce biases. The biases start 
only where revisiting of “still not at homes” stops. With the callbacks as 
a major source of expense in unbiased population samples, it has been 
worthwhile to study the possibility of circumventing the need for them 
altogether. During the past three years, we have developed a plan for 
eliminating the need for callbacks and several experiments have been 
made applying this plan to market surveys.” 

The first step may be a review of the meaning of the “not at homes.” 
1) If the survey is concerned with items open to observation within the 
household or with items about which nearly every adult member of the 
household can give information, it usually suffices to design a sample 
of households. An investigation of a household in the sample becomes 
impossible if nobody is at home (more accurately, no adult is at home) 
at the time when the interviewer rings the doorbell. 2) If the investiga- 
tion is concerned with problems where an individual reports about him- 
self (buying habits, opinions on social and political issues, taste prefer- 
ences, etc.), a sample of individuals is designed. Under these circum- 
stances, it no longer suffices that somebody (some adult) is at home. 
A particular individual has to be found. If this particular individual is 
not at home when the interviewer rings the doorbell, the information 
cannot be obtained. This paper is concerned solely with samples of in- 
dividuals. It is obvious that the not-at-home rate in samples of indivi- 
duals is higher than the not-at-home rate in samples of households. 

If several callbacks are made in order to reach individuals not found 
at home on the first visit, the assumption is maintained that the indi- 
vidual not at home at one instance, will be at home some time during those 
hours when personal visits are possible. Individuals who are at home 
only during night hours, let’s say from 10 p.m. to 8 a.M., drop out of 
almost every sample. By leaving such inaccessible extremes aside, we 
may say that people who are not at home at the first visit but can be 
found in a second, third, fourth, fifth or sixth visit, are persons who 





1 A plan for sub-sampling non-responses to mail questionnaires is discussed by Morris H. Hansen 
and William N. Hurwits, The Problem of Non-Response in Sample Surveys, Journal of the American 
Statistical Association, December, 1946, page 517. The principles set forth for determining the size of 
the original sample and the size of the sub-sample of non-responses to maximize sampling efficiency for 
@ given cost are applicable to the “not at home” problem. 

2 It has recently been brought to the authors’ attention that a somewhat similar plan was proposed 
independently by H. O. Hartley before the Royal Statistical Society. This proposal was made in com- 
menting upon a paper by F. Yates, A Review of Recent Developments in Sampling and Sampling Surveys, 
Journal of the Royal Statistical Society, Vol. CIX, Part I, 1946, page 37. 
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stay away from home more often by varying degrees. The average fre- 
quency of staying away from home among the second call respondents 
is higher than among the first call respondents; the average frequency 
of staying away from home among the third call respondents is higher 
than among the second call respondents, and so forth. Because of the 
fact that some people are away from home more often, it becomes 
necessary on the average to visit their homes more often before they 
can be found at home. But at one time if callbacks are continued in- 
definitely, they actually are found by the interviewer. Let’s say an 
interviewer finds the respondent, Mr. Smith, at the occasion of the 
third call at 8 o’clock in the evening on Wednesday. If the survey sched- 
ule and interviewing assignment had acidentally brought the inter- 
viewer to the home of Mr. Smith in the first place at 8 o’clock on Wed- 
nesday night, he would have been found at home at the first visit. Mr. 
Smith then never would have belonged to the group of “not at homes.” 

This may make it obvious that every sample of “at homes” must in- 
clude “potential not at homes.” To put it more accurately, every set of 
first call interviews of timing A must include respondents who are 
“not at homes” in another set of interviews of timing B. It must include 
respondents who are not at home in timing C and it must include re- 
spondents who are not at home in timing B and timing C. Statistically, 
therefore, it must be possible to reconstruct froma present “at home” 
sample, past samples of “at homes” and “not at homes,” if: (a) re- 
spondents provide information on their past “at home” performance, 
and (b) if the individuals in the present “at home” sample are visited at 
times chosen at random. 

Consider, for example, the following three groups, among which all 
individuals in the population are distributed: 1) those who are at home, 
on the average, 20% of the time, 2) 50% of the time, and 3) 80% of the 
time. If the time of visits is determined at random, we would expect to 
find on the first call about 20% of group (1), 50% of group (2), and 80 
of group (3). Now if each person in the sample can only be identified 
with the group to which he belongs, a correction for the under-repre- 
sentation of each group is clearly indicated. Since only about one-fifth of 
the persons in the first group are interviewed, this group is assigned a 
weight of 5. Likewise, the second group receives a weight of 2 because 
only about half of the persons in this group are found at home, while 
the third group receives the weight of 1.25. This weighting, of course, 
does not completely eliminate the bias, for it takes into account only 
three arbitrarily defined groups. On the other hand, the bias must be 
reduced because the weighting has at least partially compensated for 
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the under-representation of persons frequently away from home. 

The number of such groups, however, need not be restricted to three. 
With obvious modifications, the above example is applicable to any 
number of such groups into which the population might be divided, 
where each group contains persons who are at home any part of the 
time during which interviewing is in progress. Consider the limiting 
case in which there are as many such groups as there are persons in the 
population; that is, each person is a group of one. If it were possible to 
assign to each individual found at home, a weight equal to the recipro- 
cal of the per cent of time he spends at home, the not-at-home bias 
would vanish.* Such weighting would completely compensate for the 
under-representation of persons frequently away from home. 


Estimating the Per Cent of Time Respondents Are At Home 


While it is hardly possible to find out the exact percentage of the time 
an individual is at home, it has proven feasible to estimate this percent- 
age from information obtained by direct questions to the respondent 
himself. The problem of phrasing such questions has naturally received 
considerable attention and experimentation. Any such question as, 
“Are you usually at home in the evening?”, of course, is valueless. A 
more specific question, such as, “How many nights out of the last five, 
were you at home?”, is much better, but is still subject to two objec- 
tions: 1) the respondent is likely to answer without thinking back over 
his activities on the previous five evenings, and 2) no provision is made 
for the respondent who was at home during part of an evening and 
away for the remainder of the evening. While further improvements are 
possible, the following questions have proven to be satisfactory: 1) 
Would you mind telling me whether or not you happened to be at home 
last night at just this time? 2) How about the night before last at this 
time? 3) How about Wednesday night? 4) How about Tuesday night? 
5) Monday night? These questions relate specifically to interviews con- 
ducted on Saturday and the particular days of the week mentioned in 
questions 3, 4 and 5 are changed, as appropriate for interviews con- 
ducted on other nights. To alleviate any possible resentment at the 
personal nature of the inquiry, interviewers find it helpful to preface 
the questions by some statement like, “We are also interested in finding 
out how often people go out in the evening at various times and on 
various days of the week. I wonder if you would mind telling me... 
etc.” 





4 See Part II, page 22, footnote 10. 
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It is important to decide upon an optimum number of nights about 
which inquiries should be made, bearing in mind the limitations of re- 
spondents’ memories and willingness to cooperate, as well as the obvi- 
ous advantage of having information for as many nights as possible. 
Experience with this technique in several field operations has led to the 
acceptance of information for five previous nights as perhaps the most 
suitable, where evening interviews are conducted Monday through 
Saturday. Since the intreviewer obtains information about one night 
by observations, this makes a total of six nights upon which an esti- 
mate may be based of the actual per cent of time respondents are at 
home. A clear advantage lies in the fact that six nights coincide with a 
complete week of interviewing, and the effects of any tendency of re- 
spondents to go out more frequently on certain nights of the week is 
eliminated. Experience with this plan has indicated that respondents 
are almost always able and willing to give answers for five nights previ- 
ous. This is borne out by the extremely small number of non-responses 
and “don’t knows” to these questions. Because the questions relate to 
an individual's personal activities, it is the type of information re- 
spondents do know, and are not reluctant to impart. 


Sample Projections Based on Unbiased Estimates of the Time Respondents 
are At Home 


Information concerning the number of nights each respondent is at 
home, out of six specified nights at a particular time, provides an un- 
biased estimate of the actual per cent of the time each respondent is at 
home.‘ This information makes it possible to divide the respondents 
into six groups, each having a weight depending upon the proportion of 
such individuals expected to be found at home, as follows: 








(1) (2) (3) 
Estimated proportion 
Group of time spent at home Weight 





1/6 6. 
2/6 3. 
3/6 2. 
4/6 1. 
5/6 1. 
6/6 1. 


ant wh 
onqwoco 





The weights in column (3) are the reciprocals of the estimated pro- 
portion of the time individuals in each group are at home (column 2). 





4 See Part II, page 18. 
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Information obtained from interviews with individuals in each group 
may be multiplied by the corresponding group weight to produce an 
estimate for all individuals in the group, of those originally in the sam- 
ple, including persons not at home. There is one group, however, in 
the original sample, from whom no interviews have been received; 
that is the group containing those persons who were not at home on any 
of the six nights, including and preceding the night the interviewer 
called at their homes. This group, of course, is comparable to the group 
which is not reached even after five callbacks. 

If information is obtained for six nights at random, estimates based 
on this technique are subject to no not-at-home bias other than the bias 
contributed by the relatively small group who are not at home on any 
of six selected nights.5 As far as bias is concerned, therefore, such an 
estimate is equivalent to that obtained by a sampling plan in which six 
calls are made when necessary to find the “not at homes.” In making 
this comparison, however, some allowance must be made for the addi- 
tional contribution to the sampling error because of failure to obtain 
interviews from any “not at homes.” In actual practice, it has been 
found that this increase in sampling error seldom exceeds 2% (coefli- 
cient of variation). Moreover, for almost all situations likely to be- 
encountered in practice, the use of this technique in combination with 
only one callback will produce more reliable estimates, considering 
both the bias and sampling error, than may be obtained from a five- 
callback interviewing operation which does not provide for information 
on previous “at home” performance. 

The great expense of repeated callbacks, however, has already led to 
extensive sub-sampling of the “not at homes.” This procedure, devel- 
oped by the Census Bureau, has gained wide acceptance by scientific 
samplers, because it frequently produces closer estimates for the cost. 
In a sub-sampling operation, however, consideration must also be 
given to increased sampling error. The addition to sampling error, oc- 
casioned by a sub-sampling operation, is likely to be greater than the 
increase resulting from use of the proposed plan, even assuming an op- 
timum allocation of interviews on first call, second call, and subsequent 
callbacks. In fact, the numbers of persons, discovered by inquiry, who 
are at home one night out of six, two nights out of six, etc., will usually 
correspond fairly closely with the optimum allocation of callback inter- 
views, if this information had been available. This assumes, of course, 
that the optimum numbers of interviews to be obtained on successive 
callbacks are determined after taking into account the increased costs 





5 See Part II, page 21. 
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of obtaining interviews after repeated callbacks. It would seem, there- 
fore, that for population studies, employing samples based on prob- 
ability theory instead of the inevitable errors in judgment, the use of 
this plan will yield impressive economies of both time and money. 

While this paper uses the past 5 days’ performance with regard to 
being at home as the basis for the development of strata, workers who 
want to use this method may deal with situations in which a smaller 
number of days is justified. The decision they have to face is similar to 
deciding on the maximum number of callbacks to be made on “not at 
homes.” On a special survey in which the writer had to get a measure- 
ment of the biases in a judgment sample, it was considered necessary 
to make up to eight callbacks. In surveys where less precision is re- 
quired, three or even two callbacks are set as a maximum. It is impos- 
sible, without reference to the particular subject under study, to make 
a final statement about the number of callbacks necessary, or about its 
approximate equivalent; that is, the number of days’ past performance 
that should be covered. 

While the mathematics of the “not at home” calculation is explained 
in Part II, one point, of a psychological nature, may require a reference 
from practical experience. It is the question mentioned before as to 
whether respondents can report about their past at-home performances 
with sufficient accuracy. There is no doubt that many survey questions 
burden the memory and honesty of the respondent much more than 
the at-home question. However, it would not be good policy to take 
possible inaccuracies in sampling lightheartedly just because surveys 
as a whole have their weaknesses anyway. It is for this reason that 
actual field experiences should be quoted. Following small scale experi- 
ments which cleared the path, the method has been employed since the 
latter part of 1947 in major area surveys. 

It is well known that men and women differ substantially in their at- 
home performance. Therefore, in a survey of the Philadelphia metro- 
politan area, the proportion of females and males in the population 
was left to discovery by the survey using the new device. 


PHILADELPHIA METROPOLITAN AREA 








Per cent males estimated from Per cent males estimatec by 
at-home questions U. 8. Census, 1947 


Per cent males directly 
counted in at-home sample 








43.1% | 47.8% 47.4% 








The difference between the survey estimate and the Census estimate is 
well within the sample tolerance. 
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An internal check on response reliability is provided in this plan 
without resorting to comparisons with Census data. On the one hand, 
a record may be kept by interviewers of the number of persons found 
at home and the number not at home of all persons visited. This permits 
a direct estimate of the per cent of persons who are at home when the 
interviewer calls. On the other hand, the expected per cent of persons 
at home at a given time chosen at random is equivalent to the average 
per cent of the time all persons are at home. This may be estimated 
directly from the information obtained from respondents concerning 
the number of nights each respondent was at home out of the past six 
nights. This comparison between two independent estimates of the 
average per cent of persons at home is usually made in order to check 
the over-all accuracy of respondents’ answers, interviewers’ records and 
and any other source of error. The results of this check for the Chicago 
survey are as follows: 


CHICAGO METROPOLITAN AREA 

















At home Not at home 
per cent per cent 
Based on actual interviewers’ records of number 
of persons visited and number found at home 61.1 38.9 
Based on respondents’ answers concerning the 
number of nights they are at home 61.5 38.5 





It is advisable for anyone who may use the method in the future to 
maintain a direct count of the individuals not found at home, although 
it does not add to the information which is sought by the survey. How- 
ever, the direct count provides an elegant internal statistical check on 
the reliability of the response to the not-at-home question, and thereby 
indirectly, a check on the interviewers’ carefulness in dealing with the 
respondents. 
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FURTHER THEORETICAL CONSIDERATIONS 
REGARDING THE PLAN FOR ELIMINATING 
CALLBACKS 


PART II 


HE PLAN FOR eliminating callbacks described in Part I may be sum- 
marized briefly as follows: 


1) Each person in the sample is visited once and only once at a 
time determined at random, considering only the periods during 
which interviews are to be conducted. 

2) From each person interviewed, information is obtained as to 
whether or not he was at home at six specific instances, deter- 
mined at random, including the instance of the interview, which 
permits an estimate of the proportion of time he is at home 
during interviewing hours. 

3) Questionnaires are divided into six groups according to the esti- 
mated proportion of time persons in each group are at home, 
viz., 1/6, 2/6 - - -, 6/6 of the time, for groups one to six, re- 
spectively. 

4) The sample estimate, for any variable under study, is produced 
by weighting the results for each group by the reciprocal of the 
estimated per cent of time persons in the group are at home. 
Thus, the weights for groups one to six are, respectively, 6/1, 
6/2 ++ +, 6/6. 


Assumptions Made 

The population to which the sample estimate relates is restricted to 
those individuals who are at home at least some time during interveiw- 
ing hours; that is, those persons who could eventually be found by call- 
backs during regular interviewing hours. The decision concerning the 
hours of interviewing is, of course, extremely important to the survey 
results, and the shorter the daily interviewing period, the larger the 
number of persons arbitrarily given no chance of being found at home.® 
For example, employed persons thus may be excluded from daytime 
interviews, persons attending night school may be excluded from even- 
ing interviews; whereas, neither group is excluded from an interviewing 
schedule including both daytime and evening hours. In the limiting 





* Because more persons are usually at home at certain times than at other times, further consider- 
ation of the “optimum” periods during the day for interviewing may be worthwhile, possibly leading to 
a stratification by time of day and of the principles for optimum allocation of sample cases to such 
strata. 
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case, the excluded group consists only of those persons who are never 
at home, or who have no home, since interviews are theoretically pos- 
sible at all hours. In the following discussion it will be convenient to con- 
sider an experiment in which each of the N persons in the population is 
visited one time. Because the not-at-home problem is no different for a 
probability sample than for an attempted total census of the entire 
population, this involves no loss of generality. Let us assume that inter- 
views are obtained from all of the n persons who are found at home, i.e., 
no person refuses to be interviewed, and that (N-n) persons are not at 
home when the interviewer calls. Now in effect we have a sample of n 
individuals in which the probability of including any person is equal 
to the probability that person is at home when the interviewer calls. 

The random choice of a time of visiting each person is to avoid the 
arbitrary exclusion of persons who are never at home, for example, at 
the time of day and day of the week at which, otherwise, it may be de- 
cided arbitrarily to visit them.? When an interviewer rings a doorbell, 
he is sampling time. He has chosen at random one particular moment 
from a large number of possible moments to ascertain whether or not 
the respondent is at home. The chance that an individual is inter- 
viewed, therefore, is exactly equal to the per cent of the time that indi- 
vidual spends at home, counting only the hours during which inter- 
views are conducted. Moreover, for each of the n individuals who do 
happen to be at home, the interviewer obtains a sample of five addi- 
tional points in time. The questions suggested in Part I, together with 
the interviewer’s observation at the time of his visit, provide a system- 
atic sample from a random start of a cluster of six moments spaced 
twenty-four hours apart. It may be easily shown that this sample pro- 
vides an unbiased estimate of the per cent of time each individual is at 
home. Let p% equal unity if the 7“ person is at home at the k* moment, 
otherwise py equals zero. Since the moments were selected with equal 


6 
probability, it follows that p;=1/6 2 Pik is an unbiased estimate of 
p;, the actual per cent of time that the j“ individual is at home (j, 
=1/M2 Pi, Where M is total number of moments of interviewing.)*® 





7 A random choice of the time of calling on anyindividual may be approximated closely in actual 
operations without creating any severe administrative problems. It does not require, for example, that 
the selection of a time of visiting different individuals be independently at random. It is quite feasible 
to select first at random an evening on which all persons in a particular location or cluster will be visited, 
to order the visits for convenient travel between homes, selecting at random only the point within the 
cluster at which the interviewer is to begin making calls at a specified time. While not strictly meeting 
the requirements of randomness, this procedure results in a fairly close approximation. 

* A moment is chosen merely for convenience. Actually any other unit of time would serve as well 
provided it is understood that the interviewer does not wait for respondents to return home but simply 
takes the time necessary to ascertain whether or not the respondent is already at home when the door- 
bell rings. If one prefers to consider the unit of time infinitesimal, p; may be shown to be unbiased by 
employing the integral /pjadt. 
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We may further define 7=1/N 2; as the average per cent of time 


j=l 
all N persons in the population are at home. 


Since information concerning the time each person in the sample is at 
home is obtained for six moments, it is convenient to think of the popu- 
lation as being divided into seven groups which correspond to the actual 
and potential answers regarding the number of nights they were at 
home. The following seven groups are, therefore, defined: 


TABLE I 


STRATA BASED ON THE NUMBER OF NIGHTS ON WHICH INDIVIDUALS ARE 
AT HOME AT A SPECIFIED MOMENT 



































No. of nights at Pespestion of 
* home - time at home Size of Kuti mated Sise of Pop. Sample 
of six Size of 
pop. sample total total 
pop. 
(1) Est. Act. 
0 0/6 Po Ne N _ Xe a= 
1 1/6 Pr N NA m X: ®: 
2 2/6 ps Ns M ms Xs oo 
3 3/6 Ps Ns N, ns Xs ®: 
4 4/6 De NM N mM X. Re 
5 5/6 Ds Ns MN, ms Xs Xs 
6 6/6 pe Ne N nr Xe Y 
Groups one through six D N N ” xX 54 
Groups sero through six Me Ni — n Xs — 








The size of the population in each group, N;(t=1, 2,---, 6) is, of course, 
unknown. However, the N; are rigidly defined. For example, the N; 
persons in Group one include all persons whose answers indicated that 
they were at home only on the night the interviewer called and on none 
of the other five nights, at the specified time. In addition, the popula- 
tion in Group one includes all persons, not at home when the inter- 
viewer called, who could have truthfully given such an answer for the 
previous five nights if they had been asked the questions at the par- 
ticular moment that the interviewer called at their respective homes. 
The zero group is made up entirely of the latter, since a person who was 
not home at least one night in the specified six could not fall into the 
sample. The seven groups are mutually exclusive and exhaustive and 
are sampled independently except for the zero group which is not 
sampled at all. The other six groups may be properly treated as strata. 
It is assumed that the population is large enough and that the dis- 
tribution among the population of the variate j; is such that the sample 
does include some persons in each of the groups one to six. It will be 
noted that the N; are precisely defined only after the time of visiting 
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each respondent is fixed, and that a change in the schedule of calls will 
change the numbers and identity of the persons falling in each of the 
seven groups. In this sense, the N; are variates, having a sampling dis- 
tribution, which assume definite values whenever the time for visiting 
all respondents is selected. With these conditions in mind, consider the 
sample estimate (X) of the population total (X) for the characteristic 
under study. 


(4) Zuey 2. 


j=l 2 


Where 7; is the value of the variate under study for the 7“ person, n is 
the number of persons found at home and 7 is the number of nights any 
individual is at home out of the six nights including and just preceding 
the night of the interview. Thus, the value of the variate for each indi- 
vidual in the sample is weighted by the reciprocal of the estimated per 
cent of time he is at home. 

The usefulness of this entire technique for eliminating callbacks de- 
pends largely upon whether or not X is a “good” estimate in the sense 
that: (1) it is unbiased and (2) has a small sampling error for the class 
of population for which it is appropriate. 


Mean of the Sample F stimate 
It foliows from e yuation (4) that: 


n Xj Ne Xj 
(5) EX = 6ED) — = 6) bE; —- 
j=1 2 j=l a 


Where 9; is the probability that the 7 individual is at home when the 
interviewer calls, and E,x;/i denotes the conditional expectation of 
z;/t knowing that the 7“ person is found at home and interviewed. The 
value of x;/2, for each person interviewed, depends upon how many 
nights out of the previous five nights that person was at home. Thus 
we have, 


Tj 6 2; 5! oo ahd ‘ 
(6) E;— = ~ $i — $,)*" 





in which the coefficients of z;/i in the six terms of the summation are, 
respectively, the}probability that the 7” person” was at home 0, 1, 2, 
- + +, 5, previous nights; viz., the six terms in the expansion of (g;+,)5 
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where g; = 1—,. Equations (5) and (6) readily yield for EX: 


6 Ne 
(7) EX => Pua 
i=l j=l 
where P;; is the probability that the 7” individual falls in the 7” stra- 
tum, i.e., 
6! 


- Teo! Di(1 -- p;)* 


ij 


is the (+1) term in the expansion of (g;+,)®. Since the sum of the 
seven terms in this expansion equals unity, we have, 


(8) Db Pi = 1 — Gi. 


t=] 


Making this substitution in (7), the expected value of X becomes: 


Nt Ne Ne 
(9) EX = })2,(1 — G°) = Day — Do Gay. 
j=l j=l j=l 


But q is the probability that the 7 individual will not be at home on 
any of the six specified nights. The second term of (9), therefore, is the 
expected value of the variate for the zero group, thus, 


Ne Ne 
(10) EX = Do 2; — > Gx; = EX, — EX, = X. 
j=l j=l 


It is clear that X is an unbiased estimate of X, the total value of the 
variate for all persons in the population other than those who cannot 
ordinarily be found at home in six visits. Although X is a constant, the 
X; will vary in successive samples according to the number and identity 
of the individuals which fall in each stratum for a particular arrange- 
ment of the interviewers’ schedule of visits. As a corollary of the above 
preof, however, it can be shown that X; is an unbiased estimate of the 





® The assumption is implied that the six moments are independently selected at random. If the 
particular questions suggested in Part I are used, the six moments in question are not independently 
selected at random, but systematically selected within a randomly selected cluster of six successive 
nights. An exact statement of probability would involve the intra-class correlation of the probability 
of an individual being at home on successive nights. However, experience has indicated that this cer- 
relation tends to be quite low or even negative because a person is more apt to stay at home on a night 
following a night on which he goes out. The assumption of a zero correlation is, therefore, realistic. 
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aggregate value of the variate for the 7“ stratum, i.e., EX;=EX;." 
By definition, we have: 


Ni 
(11) X= Dx. 
j=l 
It follows directly that 
Ne 
(12) EX; = >> Pua. 
j=l 


Since this expression is identical to that given within the summations 
(¢=1, 2--+-6) in equation (7), clearly EX;=EFX;. It is also evident 
that EN;=EN; and that EN =N. 

Brief attention must be given to the zero group which inevitably 
contributes a bias to the results of any population survey. Any estimate 
for this excluded group is necessarily based on an assumption, for 
example, that X)= Xi, or where the variate under study is highly cor- 
related with the tendency to be away from home, the assumption 
might be that X¥»=a+bpo, in which (a) and (b) are the constants in 
the regressions between (x) and (p) as determined from the sample. 
This latter assumption might be warranted if the zero group is large 
and one is estimating, for example, the number of nights each week 
individuals go to the movies, or listen to the radio. The regression esti- 
mate has one clear advantage: the extent of the bias because of “not- 
at-homes” depends upon the correlation between (p) and (zx). If they 





1 A necessary condition for X¥ and the Xi to be unbiased is the random selection of a time for 
each visit. If the number and identity of the N; individuals in each stratum were determined, for 
example, by an arbitrary arrangement of the interviewers’ schedule of visits, the XX; and ¥ become 
biased estimates. Under these conditions 


‘ ¢ ni ‘ 
F=-LR=-6L dL =u; 
i; i) 71 
and 


o Ni... 
EX =6), >, — «Xx 
i—,j—, * 
where 8;; equals unity or sero according to whether cr not the jth person in the ith stratum is at home 
at a moment taken arbitrarily. 
If the exact probability p; were known for all the persons and the visits are made at a random time 
the estimate 


n 
Pay 
=1 Pi 


is an unbiased estimate for the entire population including the zero group, for 
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are uncorrelated, the bias vanishes. Where the correlation in the popu- 
lation is linear, this estimate is unbiased. It is not, however, susceptible 
to proof from the sample itself that the relationship is linear within the 
zero interval even though extremely strong evidence is afforded by the 
other six groups. The problem of estimating for this group is no differ- 
ent than that found in a callback operation. The size of the zero group 
and the consequent bias will be reduced, of course, by discovering 
whether or not respondents are at home on more than five previous 
nights, just as it is reduced by making more than five callbacks. 


Variance of the Sample Estimate 


Our next interest is in the variance of the sample estimate, o*z, 
where the sample, according to the conditious of the experiment, de- 
pends entirely upon the number and identity of the persons at home 
when the interviewer calls. For the more usual case in which the entire 
group of N; persons, itself constitutes a sample from a larger popula- 
tion, o?z, becomes a separate contribution to the total variance. Sup- 
pose that KX is used as an estimate of X, the aggregate value of the 
variate for a population out of which the NV; persons have been selected 
as a sample. The total variance of this estimate Ks*z is given by: 


K*s2z = K*E(X — X)? = K*E|(X — X) + (X — X)]? 
= K*[E(X — X)? + E(X — X)*] = K*(o*z + o*x) 
where cx is the contribution to the variance arising out of any other 
sampling operations. 
Our main interest, therefore, is in the contribution to the variance 


arising from the elimination of callbacks. By definition and by equation 
(10) we have, 


(13) otg = EX? — (EX)? = EX? — X?, 
From equation (4) the EX? is given by, 
on 2 =\* . : Lye 
(14) Bx: = B(6> =) = 368 > — +368 > ees 
j=l 4 j= 3? fsket (ink) 0? 


The first term of (14) may be evaluated directly as in equation (5). 
n zx 2 Ne z 2 
(15) 36E D> —— = 36>) pjZ; — 
j= 0” jal 
in which, as before, Z; denotes the conditional expectation knowing 
that the 7 individual is interviewed. Hence, 
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x5? . = 5! 

(16) E;—=)—- ar - a. 
a 1 2? (¢ — 1)1(6 — 2)! 


Equations (15) and (16) _— yield 





6 Ne P; 
(17) 308 > = + et oS 
j=l t=] jal 


when P;; is defined following equation (7) above. 

The second term of (14) is the sum of all possible combinations, 
taken two at a time, of the weighted variate (z;) for persons falling in 
the sample. The expected value of this term, therefore, is the sum of all 
such possible combinations for the entire population multiplied by 
their respective probabilities of occurence. First, we note that the 
probability that the 7“ person is at home on any night in no way effects 
the probability for the k” person, that is, x; and 2, are independent and 
Ex, = (Ex;)(E2;). Thus, for the second term of (14), we have, 


FE yO 


= x 
ew ae OG ¥ (2 =) (aes ~ =) 
jikml,jxk % d k=l jk t t 
2 
(18) -[6> > D;E; =) 


j=l 


-~ > (60.8, ~ =). 


j=l 





The first term of (18) has been shown to be equal to X? (equations (5) 
through (10), above). Similarly, it was shown that, 


Zz 
(19) 6p,E; 1 = 2,1 — 4g). 


These substitutions for the second term of (14), together with the right 
member of (17), for the first term of (14) produce the following ex- 
pression for EX?: 





S.. Ne Pia? Ne 
(20) =6)) > F + X?— Do xA(1 — gj). 
t=—1 j=l j=l 


Substituting in (13) and simplifying, we have, 
Nt 


(21) r= 2) af {s Pata - a’. 
tml =? 


j=l 














ELIMINATING “CALLBACKS” 25 


In almost all sampling problems of the type for which this technique is 
appropriate, the variances in the population are unknown and it is 
necessary to substitute values from the sample to compute the sampling 
error of the sample estimate. In this case >, =k/6 is the sample estimate 
of p; for all persons in the k” stratum and g,=1—7,. The sample 
estimate of o?¢ then becomes, 


6 mk 7,2 6 Pry 
(22) “2-62, 2, = |¢2, ~~ 0- oF 

ket jai Ke i=l 
where P,; is the sample estimate of P,;; obtained by substituting >, and 
for p; and g;. Since there are only six possible values of p, and q% as 
listed in Table I, the pe’ six values of 


5 
have been worked out for use in estimating the variance of any sample 
estimate based on this plan. They are shown in Column Three of Table 
II. The sum of Column Five is the sample estimate of the variance, 
6°. 











TABLE II 

Stratum (K) 6/K AE Sy* éx, 
(1) (2) (3) (4) (5) 
l 6 16.160 Si 16.1608; 
2 3 6.957 Ss 6.9578: 
3 2 2.802 8, 2.8028; 
4 1.5 1.027 Ss 1.027S, 
5 1.2 305 Sa -305Ss 
6 1 0 Ss _ 





n 
*S, = . 274j. 


™* 


Numerical Examples 


To gain insight into the probable effects of practical applications of 
this plan, it may be helpful to consider a hypothetical population and 
the expected sample which would result. Such a population is shown in 
Table III. The probability density function of the distribution of 9; 
in this population is 3f7dj,;. This function was selected primarily for 
convenience in computing the expected numbers at home 1, 2, ---, 6 
nights out of the six in question. It does, however, roughly approximate 
a typical distribution which might be inferred from actual answers to 
the questions regarding nights at home. 
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The upper half of Table III shows the relation between the actual 
percentage of time persons are at home and the expected numbers at 
home 1, 2, - - -, 6 nights out of six. The marginal distribution of j; 
for the entire population (first line of Table III), is obtained by inte- 
grating successively 


lA 


b 
aN. f pap; (0<asbs}1) 
where a and b are the percentages shown in the heading. The average 
per cent of the time all persons were actually at home () is given by: 


1 
(23) = Ep, =3 f (:) Bdp; = 3[40;]o = 75%. 
0 


The product, Nr multiplied by this integral between the limits (@) and 
(b), will yield the expected number falling in sample, who were at home 
between a and b per cent of the time. When (a) and (b) are taken suc- 
cessively in intervals of 10%, the distribution of (j;) in the sample is 
obtained (line opposite “Expected Sample” in Table III). Similarly, 
the expected numbers not in the sample are obtained from the integral 


b b b 
aN. [90 - BaD, = Ne J ean, - f pap, dil 


The expected numbers, in the population, at home 1, 2 - - -, 6 nights 
out of six are given by: 


! 


b 

_ <tttsmmmmm QAI a &)0-ta ade 

EN; aN. il(6 ih i)! Pj (1 Dj) Pj dp; 
(24) 


b 
= 3N, f p;*P ip. 


The expected numbers, for the population, in the main body of the 
table are obtained by using the successive limits (a) and (6) in intervals 
of 10% (i=1, 2, 3 ---6). The corresponding expected numbers in 
the sample are obtained from 


b 5! 
En; = ay f 5.) 0-1 — 5,)-92dp; 
ni t : G— Due ai PP ( D;)* *p;7dp; 


4 6 
= 3N; (=) f P53p;7dp;. 





(25) 
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For any integral value of 7, we have, 


(26) Ba, = = EN; 


Clearly this relation holds regardless of the limits of integration and 
the expected number in the sample in any “cell” is exactly equal to 7/6 
multiplied by the corresponding expected number in the population. 
Therefore, by applying the weights 6/7 to any group in the sample, we 
obtain an unbiased estimate of the corresponding group in the popula- 
tion. 

An interesting check, useful in practical applications of the plan, is 
afforded by comparing n/N with n/N, since each of these ratios is an 
estimate of the average per cent of time all persons in the population 
are at home. It is clear that n/N;, the per cent of all individuals visited 
who were actually found at home, is an unbiased estimate, where calls 
have been scheduled at random. Because EN = N <N,, this comparison 
indicates the net error resulting from several sources including (1) bias 
due to the zero group (2) sampling error (3) response bias in reporting 
information on previous nights at home, etc. For the population in 
Table ITI, the bias due to zero group equals 


. : 750,000 750,000 
Bias = — 
988,095 1,000,000 





= .757 — .75 = .007. 


Several other interesting comparisons can be made from this “model” 
population and expected sample. Consider the characteristic z;. which, 
for example, is possessed by all persons in the population who are at 
home more than 80% of the time and by no one else. The expected 
value of the sample estimate X is EX =X =487,996 while the popula- 
tion value for the entire N7 persons equals 488,000. On the other hand, 
if the characteristic under study were possessed by all persons in the 
population at home less than 20% of the time and no one else, EX = 
X =4,884 while the population value for the entire Nr persons is 
X;=8,000." This latter substantial bias (3,116 =39%), of course, would 
also result from a sample in which five callbacks were made.This com- 
parison points up the fact that in extreme cases where characteristics 
are possessed almost exclusively by persons who go out frequently, 





11 In many situations, it may be preferable to use the ratio estimate x’ =(NiX/N) instead of X. 
Where the primary interest is in the per cent (X:/N;), the estimate (X¥/N) =(X’/N:) is often to be 
preferred. This estimate contains a bias, which is usully not large, arising from the random variate 
appearing in the denominator. For the two examples, the estimate Xx’ yields, respectively, 493,875 and 
4,943. 
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extraordinary effort must be made to reduce the not-at-home bias. 
In such a case, this plan may be coupled with—say two callbacks to those 
not at home on the first call, in which questions concerning previous 
not-at-home performance are asked, as in the original calls. Application 
of similar projection procedure to these respondents will reduce the 
bias to about 30%. From the standpoint of sampling bias alone, there- 
fore, in an extreme case this technique together with two callbacks will 
reduce the bias of 39% obtained from five callbacks, to a bias of 30%, 
after eliminating three expensive callbacks. 

Obviously, the considerable advantage afforded by this plan in re- 
ducing bias must be modified to some extent by consideration of the 
contribution to sample error because of the failure to include all N 
cases. For the two examples just cited, this sampling error as estimated 
from the sample cases is .00059 and .0225 (coefficient of variation), for 
the cases, (respectively), in which the characteristic is possessed by 
persons at home more than 80% of the time and less than 20% of the 
time. 

In contemplating any particular survey operation, it is necessary to 
compare the possible benefits which might result from the use of the 
nights-at-home questions with those of alternative plans, as for example, 
provision for callbacks to only a sub-sample of the persons nct at home. 
The sub-sample, of course, also contributes to the sampling error. Un- 
fortunately, there are few safe generalizations regarding which plan 
will produce the most information for the cost. In a real sense, the two 
plans are equivalent, for the nights-at-home questions result in sub- 
samples of individuals who are usually at home about one night in six, 
two nights in six, and so on. Moreover, the number of sample cases 
obtained for each of these “sub-samples” tends to correspond roughly 
to the numbers that would have been obtained from sub-samples of 
callbacks that were allocated according to optimum conditions con- 
sidering the costs. That is to say, second call interviews cost more than 
first call interviews; and third, fourth, fifth and sixth call interviews 
become progressively more and more expensive so that the optimum 
allocation formula leads to smaller and smaller sub-samples of call- 
backs for each of these groups, respectively. This is because respondents 
found on callbacks tend to be more widely scattered requiring extra 
travel time and expense. 

Since the addition to the variance resulting from the use of the 
nights-at-home question is only one contribution to the over-all vari- 
ance of the estimate, its importance depends upon the efficiency of the 
original sample. If the original sample is both large and highly efficient 
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it may pay to obtain additional interviews from persons frequently 
away from home even at a high cost per interview. Experience indicates, 
however, that population samples are seldom efficient enough to war- 
rant the extra cost. This naturally depends also upon the particular 
cost structure of the organization conducting the survey. 

Because of the effects of clustering respondents to save travel costs, 
efficiently conducted population surveys are likely to have more than 
twice the sampling error of a widely scattered unrestricted random 
sample of the same size. By making this assumption, it is possible to 
compare the results of this plan for eliminating callbacks, Plan A, 
with those of an operation providing for initial calls to 10,000 persons 
and up to five callbacks to find the not-at-homes, Plan B. The two 
samples are to be drawn from the population described in Table III. 
The characteristic under study is possessed by about half of the popu- 
lation and is unrelated to the tendency to be at home. Under these 
conditions the most probable results of these two operations are shown 
in Table IV. 

The total number of home visits under each plan are equal. Inasmuch 
as all visits under Plan A are first calls, while about 4,500 visits under 
Plan B are callbacks, Plan B must require more expensive field work. 
Since the sampling error for Plan B is larger than for Plan A, it is ap- 
parent that Plan A yields more information for the costs, under the 
stated assumptions of this example. If the original sample were twice as 
efficient as a random sample, the sampling error for Plan A would be al- 
most exactly equal to that of Plan B, one-fourth of one per cent. Thus, 
Plan A would still yield more information for the cost. The perfect cor- 
relation, pgvy=1 implicit in the assumption that the characteristic 
under study is not related to the frequency with which persons are 
away from home, tends to reduce the sampling error of Plan A some- 
what unrealistically. It is easy to construct hypothetical examples 
based on other assumptions to compare the two plans and possibly to 
appraise the effects of sub-sampling the original not-at-homes. 

Some allowance should be made for the fact that interviewers are 
frequently able to find out from other members of the family when the 
absent member will be at home and by scheduling callbacks according 
to this information a higher percentage of persons may be found on 
subsequent visits. The primary difficulty here, however, is that call- 
backs necessitate travel to a neighborhood and it is seldom possible 
to schedule the visits to coincide with the various times that several 
persons to be interviewed in the neighborhood are at home. In addition, 
no one will be found at home in many of the households visited and it is 
not easy to obtain reliable information from neighbors. Nevertheless, 
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TABLE IV 


COMPARISON OF THE PROBABLE RESULTS OF A CALLBACK OPERATION WITH 
THOSE OF A PLAN FOR ELIMINATING CALLBACKS 





























Plan A. Nights at Home Questions Plan B: Callback Operation 
No. of Number 
nights ee Inter- with Xs — Not-at Inter- 
at home ie views  character- pe homes views 
s visits ‘nis visits 
out of six istic (x) 
(1) (2) (3) (4) (5) (6) (7) (8) 
All 14,464 10,849 5,425 7,146 All calls 14,464 4,583 9,881 
6 4,822 4,822 2,411 2,411 ist Call 10,000 2,500 7,500 
5 3,616 3,014 1,507 1,808 2nd Call 2,500 1,000 1,500 
4 2,583 1,722 861 1,292 | 3rd Call 1,000 500 500 
3 1,722 861 431 861 4th Call 500 286 214 
2 1,032 344 172 516 5th Call 286 178 108 
1 517 86 43 258 | 6th Call 178 119 59 
0 172 — - -— 
= 7 
oo ae 7,146 50% Dep=50% 
mae. > = (a) en 
N 14,464—172 (50) (.50) 
. - oF, = — =1.006% 
037=4,443 of?=8,886 pxrv=1.00 9,881 
> “a ~ oOo ~ ~ 
2 X? (ox? on? 2pxXNoXoN 
oe, == ale _—**S*"' ~ 00000186 
N?2 \X? N? XN 
2 _ _[(.50)(.50) 
o> tox Nn +4] ——— |&.00000186 
P4 / 
14,292 
+ .000069972.00007183 


05 ,22.85%. 





to the extent that such information can be found and utilized the call- 
back operations become more effective. 

Other considerations pertinent to a decision regarding the possible 
use of the night-at-home questions may be mentioned briefly, as fol- 
lows: (a) the efficiency of the original sample, (b) the number of call- 
backs permitted by the budget, (c) the time available in which to pro- 
duce the final survey results, (d) the relationship between the variate 
under study and the tendency to be away from home, (e) the time of 
day during which interviews are to be conducted, (f) the particular 
population group under study, e.g., farmers, housewives, car owners, 
men over 21 years old, and so on. It should prove to be especially help- 
ful in the further development of this plan to have reports from others 
concerning their experience with it, as well as any suggestions for modi- 
fications or improvements in the particular techniques described above. 





APPLICATION OF LEAST SQUARES REGRESSION 
TO RELATIONSHIPS CONTAINING AUTO- 
CORRELATED ERROR TERMS* 


D. CocHRraANE AND G. H. Orcutt 
Department of Applied Economics, Cambridge 


We point out that autocorrelated error terms require modi- 
fication of the usual methods of estimation and prediction; 
and we present evidence showing that the error terms involved 
in most current formulations of economic relations are highly 
positively autocorrelated. In doing this we demonstrate that 
when estimates of autoregressive properties of error terms are 
based on calculated residuals there is a large bias towards 
randomness. We demonstrate how much efficiency may be lost 
by current methods of estimation and prediction; and we give 
a tentative method of procedure for regaining the lost effi- 
ciency. 


INTRODUCTION 


HREE MAJOR complications may be distinguished in the statistical 

measurement of relationships between economic time series: 

1. The existence of simultaneous relationships between the vari- 
ables. 

2. The presence of auto-correlated error terms. This has been less 
exactly called the time-series complication. 

3. The presence of errors of observation in each of the variables. 

The first complication was forcefully brought to the attention of 
economists by Frisch [1] and Haavelmo [2]; and much work has since 
been done by Koopmans [3] and others, [4] in finding the structural 
parameters when the economic variables are described by a system of 
simultaneous equations. This approach is very promising but the time- 
series complication has been assumed away by the specification that 
the error terms which enter into each equation are independent in suc- 
cessive periods of time. 

A considerable amount of work has also been devoted to problems 
relating to the second complication. The rather extensive literature 
connected with the variate difference method, conveniently summarized 
by Tintner [5], and also the general analysis of economic time trends 





* We wish to express our thanks for the considerable assistance we have received from Richard 
Stone. 
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may be included under this heading. More directly related to the prob- 
lem are the studies which examine the distribution of correlations be- 
tween autocorrelated series, [6] the major proportion of which are 
devoted to tests for the null hypothesis. Of those papers which are 
concerned with the measurement of functional relationships between 
series, few make it clear that the significant factor in the analysis is the 
autocorrelation of the error term and not the autocorrelation of the 
time series themselves. This fact has been well expressed by Aitken, [7] 
but its importance seems to have escaped the attention of economists. 
We should also refer to a paper by Champernowne [8] which became 
available after this study was essentially complete. Champernowne’s 
paper recovers much of the ground developed by Aitken and is an ex- 
ceedingly useful study, carrying the problem into the field of statistical 
estimation and sampling theory. 

The third complication arises when the assumption that the explana- 
tory variables are measured free from error cannot be maintained, and 
may therefore be a problem of some importance when considering eco- 
nomic data. In the absence of a complete knowledge of the correlation 
matrix of the errors, simplifying assumptions that the errors in each of 
the variables are random and uncorrelated both with the systematic 
part of each variable and with the errors in the other variables must be 
made. The problems involved have received consideration in the work 
of Frisch [9], Koopmans [10], Tintner [11], Reiersgl [12] and Geary 
[13]. 

The objects of this paper are four-fold. First, we wish to focus the 
attention of economists on the fact that the presence of autocorrelated 
error terms requires some modification of the usual least squares 
method of estimation; and secondly, we wish to show that there is 
strong evidence in favour of the view that the error terms involved in 
most current formulations of economic relations are highly positively 
autocorrelated. In doing this we demonstrate the presence of a large 
bias towards randomness in estimates of the autoregressive properties 
of error terms which are based on calculated residuals. Third we indi- 
cate roughly how much efficiency is lost by current methods of estima- 
tion and prediction if error terms are highly autocorrelated; and finally 
present a tentative method of procedure. 

In arriving at our conclusions we have placed considerable reliance 
on results obtained from a number of sampling experiments. We recog- 
nize that results arrived at by this procedure may not have the elegance 
or all of the utility of results obtained deductively from the same as- 
sumptions; nevertheless this method of approach is a legitimate one 
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and frequently makes it possible to obtain useful answers to problems 
which have proved stubborn to mathematical statisticians. In this 
connection it might be noticed that there are a large number of impor- 
tant questions in the field of statistics which in principle could be an- 
swered deductively but which have till the present time proved too 
difficult. Most of these questions could be answered by sampling experi- 
ments and it is to be hoped that, as improved calculating equipment 
becomes available, more attention will be given to this approach. 

In order to concentrate on the problem of auto-correlated errors, we 
have ignored the difficulties arising from the simultaneous equations 
complication and the errors in the variables complication. However, 
it should be obvious that for the purpose of estimating structural param- 
eters it is necessary to find a method of dealing simultaneously with 
all three complications, or at least some indication of their relative im- 
portances. A consideration ef some aspects of the difficulties to be ex- 
perienced in analysing relationships when more than one of these com- 
plications are present is contained in a following paper [14]. 





REGRESSION ANALYSIS WITH AUTOCORRELATED ERROR TERMS 
OF KNOWN AUTOREGRESSIVE PROPERTIES 


It may be helpful to restate briefly the assumptions underlying the 
method of least squares. Suppose a single linear relationship exists 


between the variables 21;, r2:, . . . Xe Of the form 
2. 

(2.1) Tie = + 2. bijXje + Ut 
j=2 


where u; is a random error term with constant variance, while the a 
and the b’s are constants to be determined. Provided the 224... riz 
are independent of the random error term u;, then the best linear un- 
biassed estimates of these coefficients are given by the method of least 
squares, best estimates meaning those estimates which have a minimum 
variance. This is true even if the independent variables are autocorre- 
lated, provided we can consider them as fixed in repeated samples [15]. 
If in addition the error term is normally distributed then the least 
squares estimates are maximum likelihood estimates [16]. 

In many economic relationships it is an oversimplification to assume 
that error terms are independent in time. If we have a relationship in 
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which the error term is autocorrelated, it has been shown by Aitken [17] 
that the method of least squares still yields the best linear unbiassed 
estimates of the regression coefficients provided the lack of independ- 
ence in the error series is taken into account. One method of overcoming 
this lack of independence is to make the error term random by trans- 
forming all the variables according to the autoregressive structure of 
the error term. Suppose we have a linear relationship given by 


(2 2) Ys = Ao + ate + Uy 
where u; is generated by the Markoff scheme 
(2.3) Ue = Buea + € 


with random disturbances e, and a known autoregression coefficient 8. 
We may substitute for wu; in equation (2.2) and obtain 


(2.4) Ys = Ao + air,’ + € 
where 

(2.5) ys = Ye — By: and 
(2.6) ty =X — Brrr 


and the application of least squares to equation (2.4) will produce best 
linear unbiassed estimates of the regression coefficients ao’ and a. 

It is also possible to improve on the ordinary methods of prediction 
when the error terms are autocorrelated. If we wish to estimate y; from 
a given 2; it can be seen that equation (2.2) is not the most efficient 
form in which to make this estimation. A more appropriate form 
would be to use the relation 


(2.7) ne = Go’ + ai(r, — Bx+-1) + By: 


where da’ and a; are estimated from (2.4). In a later section we shall 
illustrate the gain to be achieved by using this relation in problems of 
estimation. 

In the discussion which follows it is convenient to restrict the mean- 
ing of error term to the true series of errors in a relationship, that is the 
series of errors which would be obtained if the true values of the re- 
gression parameters were applied in the relationship. To distinguish 
the discrepancies actually obtained from the true errors we shall call 
them residuals. In addition, we shall limit the word disturbance to de- 
scribe the random elements in an autoregressive equation. 





1 A more complete statement of this solution is to be found in Section VI. 
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AUTOCORRELATION OF ERROR TERMS AND RESIDUALS OF ECONOMIC 
AND CONSTRUCTED RELATIONSHIPS 


In this section we develop the argument that the error terms in 
many if not most current formulations of economic relations are highly 
positively autocorrelated, but it should be stressed that we are not 
trying to prove that this must be so in every case or that it is impossible 
to formulate relations in which the error terms are random. Since the 
autocorrelation properties of economic time series will frequently arise 
in this section, we should first like to refer to a study by Orcutt [18] 
in which it is shown that the fifty-two series used in Tinbergen’s [19] 
model of the economic system of the United States might be considered 
to have been obtained by drawings from a single population of linear 
stochastic series having the same underlying autoregressive structure. 
The underlying autoregressive equation was estimated to be of the form 


(3.1) Tt41 = LX + 0.3(2x: _ 21-1) + €t41 


where the e’s are random disturbances. The high positive autocorrela- 
tion of economic time series which (3.1) implies is a feature which should 
not be overlooked. 

Turning to the error terms, let us investigate their sources and see if 
there is reason to believe that the error terms also are likely to be highly 
positively autocorrelated. We can examine their sources under three 
main headings. 


(1.) Systematic errors may arise from a faulty choice of the form of 
relationship assumed to exist between economic variables. Since the 
economic variables are positively autocorrelated, then in general errors 
of this type will be positively autocorrelated. Further the shortness of 
most available time series makes the statistical results meaningless if 
very complicated relationships are adopted, so that errors of this type 
are inevitable. 


(2.) Error terms may arise owing to the omission of variables, both 
economic and non-economic, from the analysis. Important variables 
may be omitted either because they are not available or because their 
importance is not realized. Furthermore, because of the brevity of 
available time series, it is also frequently necessary to neglect variables 
which individually have but a small influence. Nevertheless, it is evi- 
dent that the total influence of a number of such variables may be 
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very substantial and highly positively autocorrelated.2 Now, as al- 
ready indicated, there is strong eivdence in favour of believing that 
most economic time series are highly positively autocorrelated. There- 
fore, in so far as the omitted variables are economic time series, we may 
expect the resulting error terms to be highly positively autocorrelated. 

Consider also the case of non-economic variables which are likely to 
influence economic behaviour but which are generally omitted. Some 
of those that more readily come to mind are population and its age, 
sex and spatial distribution, changes in cultural patterns, technological 
developments, exploitation and exhaustion of mineral resources includ- 
ing changes in soil fertility, and climatic conditions. Most of the above 
series have very high positive autocorrelations but even where the auto 
correlations are not high, as in the case of at least certain climatic con- 
ditions, it is evident that their impact on the economic system is still 
likely to be autocorrelated. Thus even if rainfall was really a random 
series, the water level in the soil, being the result of rainfall over several 
years, would be positively autocorrelated. We might recall in this re- 
spect the correlograms given by Wold [20] of the average yearly rain- 
fall during the period 1867 to 1936 of four cities in or near the drainage 
basin of Lake Vaner and the average annual water level (obtained from 
quarterly observations) of Lake Vaner from 1867 to 1936. The correlo- 
gram of the yearly rainfall indicated a random series while that of the 
level of the lake indicated a positively autocorrelated series showing 
that, whilst the occurrence of certain meteorological factors may be 
random, their general influence over time may be systematic. 

Now it may be reasonably argued that the economic behaviour of 
individuals is not completely dependent on economic variables or 
non-economic variables of the type we have mentioned, and that, even 
if an explanation incorporated in the correct manner as many as nec- 
essary of these variables, it would still not yield perfectly correct pre- 
dictions.* No doubt this is true, and the explanatory variables needed 





2 This may be shown as follows. If we have two unrelated autocorrelated series z; and y¢ whose first 
autocorrelations are given by 


cov (xz, 2-1) cov (yt, yt-1) 
var (z) var (y) 





then if 2; =x;+2; the first autocorrelation of zis given by 
cov (xz, 1) + cov (yt, ye-1) 
var (x) + var (y) 
This result may be generalized to show that the sum of any number of autocorrelated series is also 
autocorrelated with its first autocorrelation equal to the sum of the first lag covariances of the individual 
series divided by the sum of the individual variances. 
*See for example T. Haavelmo, “The Probability Approach in Econometrics,” op. cit. Section 11. 
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to complete the explanation may be of an approximately random char- 
acter since they relate to such things as the physiological processes of 
each individual. However, it would be a mistake to infer from this that 
economic time series contain a significant random component, for what 
will obviously happen when the behaviour of a large number of indivi- 
duals is averaged is that those actions of individuals which are posi- 
tively correlated with the actions of others will dominate the average 
while those actions which are random for each individual and uncorre- 
lated as between individuals will be averaged out. 

(3.) The series of data used may not measure exactly what is re- 
quired for the particular analysis. In so far as the discrepancy is one of 
coverage, it seems reasonable to believe that the error term involved 
will have much the same autoregressive properties as economic series 
in general. In so far as the discrepancy is more nearly what might per- 
haps be called a pure error of observation, it would appear more diffi- 
cult to say anything about whether or not it is autocorrelated. How- 
ever, on the basis of discussions with economists engaged in the con- 
struction of basic economic data, we have formed a very strong impress 
ion that, if an error is committed one year, it will very likely be com- 
mitted again the next year and that most errors of observation are 
positively autocorrelated. 

Let us now see whether our theory is plausible by making a brief 
examination of the autocorrelations of the residuals obtained in several 
econometric studies. These are two papers by Lawrence R. Klein, 
“The Use of Econometric Models as a Guide to Economic Policy,” 
[21] and “Economic Fluctuations in the United States 1921-1941” 
[22]; a paper by M. A. Girshick and Trygve Haavelmo [23] and a paper 
by Richard Stone [24]. The measure of autocorrelation used is the ratio 
of the mean square successive difference to the variance of the residuals. 
This ratio is generally denoted by 65?/s? [25] where 6? and s? are defined 
by 





N-1 
3.2 6? = -- . 
(3.2) Wo 2 Zt) 
1X 
(3.3) s? = — )° (1 — 2), 
tml 
h .< 
where E=— ‘ 
N x ” 


This ratio has been calculated by Klein for the residuals in his two 
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papers and we have computed the ratios for the residuals in the other 
two papers [26]. Two ratios in each of Klein’s papers have been omitted 
as they refer to first differences of the economic series and are not com- 
parable for our purposes. It should be mentioned that the residuals 
given in Klein’s paper in Econometrica and the residuals given by Gir- 
shick and Haavelmo were calculated by the reduced-form method which 
presupposes that it is possible to solve for each of a number of jointly 
dependent variables in terms of exogenous variables and random error 
terms and these random error terms are simply linear combinations of 
the error terms given in the original system of equations [27]. The 
residuals obtained from Klein’s mimoegraphed paper and from Stone’s 
paper were calculated by ordinary least squares method of regression. 
The total number of series considered is 43 and Table I shows them 
classified according to source and number of parameters used in each 
equation. The individual values of 6?/s? are illustrated on the scatter 
diagrams of Figures I-IV. 


TABLE I 
SUMMARY OF VALUES OF 3?/s? OBTAINED FOR VARIOUS RESIDUALS 























Number Number of parameters 
Source of residuals of Total 
years 3 4 | 5 6 
Klein—Econometrica 22 2 7 2 1 12 
Klein—Mimeographed study 20 1 7 1 — 9 
Girshick and Haavelmo 20 2 2 1 — 5 
Stone 19 4 6 6 1 17 
Total 9 22 10 2 43 
P(62/3? <1.24) =0.025 7 5 4 _— 16 
P(6?/3? <1.37) =0.05 8 10 4 _ 22 























The probability distribution of 5?/s? for a random series has been 
tabulated [28] for various N, where N is the number of items. This dis- 
tribution is symmetrical around 2N/N—1 so that for N =20 the ex- 
pected value of 6?/s? for a random series is 2.11. This is the horizontal 
dotted line shown on the diagrams. In view of the high positive auto- 
correlation of economic time series and the reasons given for expecting 
error terms to be autocorrelated, there seems little chance of obtaining 
a value of 5?/s? around the upper tail of the distribution and, since we 
wish to minimize the risk of failing to reject a value of 5*/s? as coming 
from a random population, the appropriate test would seem to involve 
the use of the value of 6?/s? corresponding to the 5 per cent significant 
level, from the lower tail only. Since all our series are of approximately 
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the same length, this value is 1.37 for VN =20. The value of 6?/s?( = 1.24) 
corresponding to the 5 per cent significance level which includes both 
tails has also been added.‘ Out of the 43 series, 16 are significantly 
different from a random series at the 23 per cent level, while 22 are 
significant at the 5 per cent level. These results indicate that in many 
cases the assumption of random error terms is not a very good approxi- 
mation to the truth. 

The sloping lines on Figures I-IV correspond to the average of 
twenty estimates of 6?/s? obtained from constructed relationships, de- 
scribed in subsequent paragraphs, in which the error terms were first 
summations of random series. It would seem more reasonable to con- 
sider that the values of 6?/s? are distributed around a line of this nature 
rather than around the horizontal random line. This suggestion is sup- 
ported by the decreasing proportion of residuals which are significantly 
different from random series as the number of parameters in the rela- 
tionship increases. From Table I it can be seen that the proportions 
which are significantly different from random are 8/9, 10/22, 4/10 and 
0/2 for 3, 4, 5 and 6 parameters respectively. 


Construction of an experimental model. The examination of the resid- 
uals obtained from actual economic relationships fails to reject the 
hypothesis that error terms are highly positively autocorrelated in a 
number of economic relationships. Little is known about the behaviour 
of relationships possessing autocorrelated error terms, so it was de- 
cided to construct several relationships of this type from artificial series 
and observe the results of applying least squares regression. The general 
form of the relationship adopted was— 


(3.4) Xi = k + Dyo.seX2 + biz.2¢:X3 + dieost + u 


where X2, X3 and u were independently constructed series all possessing 
the same autoregressive structure, ¢ represented a linear time trend and 
the true values of the constants were k=0, by.s:=2, bis22=1 and 
biz23=0. Thus the actual equation used for the construction was— 


(3.5) X1 = 2X2 + X3 + u. 


Five sets of relationships of this form were constructed with different 
autoregressive structures, each set containing 20 equations. The series 
used were generated according to the following formulae :— 





4 Klein has taken the 5 per cent level of significance to include both tails of the distribution (EZcon- 
ometrica, op. cit., p. 114). 
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A. Ley. = %e + 0.3(e: — tea) + ety 
B.  Xey1 = Xe + €ey1 

(3.6) C. rey = 0.3824 + ety 
D. Ley = €t41 


E. Vest = €e11 — C2 


where the e’s denote series of random disturbances. Instead of stating 
the precise form of the autoregressive equation each time a series is 
referred to we shall use the letters A, B, C, D and E as a convenient 
notation. 

The random elements were obtained from Tables of Random Sample 
ing Numbers [29]. Two figure numbers were extracted, ignoring the 
number 00, so that they ranged from 1 to 99. The number 50 was then 
subtracted throughout so that we possessed a rectangular distribution 
ranging from +49 tc —49 with a true mean of zero. We then formed 
60 independent series of these random elements, each one 20 items in 
length, omitting a few numbers between each series so that we could 
later extrapolate for forecasting. The application of these series in- 
groups of three to the relation (3.5) gave us the 20 equations of set D. 
The other transformations were then formed from this basic set. For 
example, the set of first summations, series B, was formed by making 
the first-term of each series zero and summing progressively over each 
item of the random set. Simplifications of the calculations involved 
were made by using the fact that C is the first difference of A, while B, 
D and E are respectively the first summation of a random series, a 
random series, and the first difference of a random series. It can be 
easily seen that there were 21 items in series A and B, 20 in C and D 
and 19 in EZ. They are therefore analogous in length to most available 
economic time series. 

In each sct a regression analysis was carried out with one explana- 
tory variable (in this case the error term became (X3+4) ), in several 
of the sets the analysis was extended to two explanatory variables and 
in the case of set B to three explanatory variables. In addition, the sta- 
tistic 6?/s? was calculated for the actual error terms and for the resid- 


uals. A complete summary of these calculations is contained in Table 
II. 


Bias introduced in estimating the autocorrelations of residuals. Given 
a set of equations in which the explanatory variables and the error 
terms possess the same autoregressive structure, can we say anything 
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about the way in which the autocorrelations of the residuals vary as 
the number of explanatory variables is increased? Figure V presents 
this information with each set labelled according to its autoregressive 
structure. The number of parameters includes the constant term so that 
we have one parameter when only the mean is estimated. Straight lines 
have been fitted visually to the points for each set using as additional 
points, except for D, the true values of &?/s? which are zero for A and 
B, 1.4 for C and 3.0 for EZ. The sets A, B and C show a marked bias 
upwards as the number of parameters is increased. It is not expected 
that this linearity would continue indefinitely but would flatten out 
as more than four parameters are used and approach nearer and 
nearer to the value of 6?/s? expected for a random series. The random 
set D merely shows a distribution around the horizontal straight line 
and when we pass to the series of first differences of random numbers E 
there is only very slight evidence of a downward movement in the val- 
ues of 6°/s? with increasing parameters.5 

Another way of illustrating the bias in the estimated autocorrela- 
tions of the residuals as more variables are introduced is to apply our 
previous test of significance to the individual values of 6?/s? obtained 
in set B. This has been done in Table III. As the number of parameters 
increases the proportion of residuals which yield a value of 6?/s? sig- 
nificantly different from that expected for a random series at the 5 per 
cent level grows smaller; from 19/20 when only the mean is estimated 
to only 10/20 when four parameters are used. This is a similar result 
to that found for the residuals of actual economic relationships. 


TABLE III 
SIGNIFICANCE TESTS APPLIED TO RESIDUALS OF SET B 























Number different from random 
. Number of at significance levels of Total number 
Explanation : 
Parameters os residuals 
24 per cent 5 per cent 

Actual error term 1 19 20 20 
{ 1 19 19 20 

One explanatory variable 2 17 18 20 
One explanatory variable +time 3 13 15 20 
Two explanatory variables 3 11 14 20 
Two explanatory variables +time 4 7 10 20 





The amount of variance to be explained in an economic time series 
can be regarded as composed of two parts, the first due to the smooth 
movements of the autoregressive structure of the series and the second 





5 The first autocorrelation of the first differences of a random series is r: = —0.5 or 5*/s* =3.0. 
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due to the random disturbances. What is important for a real explana- 
tion is that a proportion of the variance due to the disturbances should 
be explained as well as that due to the general movement of the series. 
Now quite high correlations between autocorrelated series may be 
obtained purely by chance® and when this happens what is largely 
explained is the variance due to the regular movements through 
time. The residuals of such a relationship will be essentially the year- 
to-year fluctuations and of a more random character than the original 
series. This can be illustrated by comparing the two cases in set B in 
which two explanatory variables are used, one of which includes a 
linear time trend and the other two real variables. From equations 
(4) and (5) in Table II it can be seen that, while the inclusion of time 
adds an amount of 0.026 less to the explanation of the variance of the 
dependent variable than the inclusion of the second explanatory vari- 
able, the average value of 6?/s? for the residuals is 0.023 greater. These 
are the two points which are close together in Figure V for three 
parameters and it can be seen that the addition of a linear time trend 
in the explanation produced approximately the same bias as the in- 
clusion of a real explanatory variable. This is confirmed by the average 
value of 6*/s? obtained when Xe, X; and ¢ are the explanatory vari- 
ables. 

Since the inclusion of the bogus variable time had about the same 
effect in biasing the residuals towards randomness as the inclusion of 
real explanatory variables, we were curious about the effect of including 
other types of non-related series in the explanation. We therefore cor- 
related two unrelated series, X2 and X3, of set B. With X2 as the depend- 
ent variable it was found that the average amount of the variance ex- 
plained was 0.32, while the mean value of 6?/s? for the residuals was 
0.74. This latter value is slightly higher than that obtained for equation 
(3) of Table II where the average explained variance is 0.64 with a mean 
value of 6?/s?=0.69 for the autocorrelation of the residuals. This 
suggests that if error terms are autocorrelated then it would fre- 
quently be a mistake to attempt to justify the statistical requirements of 
randomness by acding more explanatory variables or by experimenting 
with different combinations of the variables. Owing to the shortness 
of economic time series, high accidental correlations may be obtained 
between the variables added and the error term due to their autore- 
gressive structures and since the residuals obtained from the least 
squares method of regression are orthogonal to the explanatory vari- 
ables they will tend to be biassed towards a random series. 





® See G. U. Yule op. cit. and Orcutt and James op. cit. 
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FIGURE V. AUTOCORRELATION OF RESIDUALS OBTAINED FROM 
CONSTRUCTED RELATIONSHIPS 


ESTIMATION OF REGRESSION COEFFICIENTS AND PREDICTION 
BY LEAST SQUARES FOR RELATIONSHIPS CONTAINING 
AUTOCORRELATED ERROR TERMS 


Our objectives in this section are to show that the usual application 
of the method of least squares to relationships containing highly 
positively autocorrelated error terms results in an extremely inef- 
ficient use of data and that it is only necessary to apply a transforma- 
tion which will make the error term approximately random in order to 
regain most of this efficiency. 

The complete information is contained in Table II but in order to 
illustrate the position more clearly we have set out some of the more 
relevant calculations in Tables IV and V. 
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TABLE IV 


VARIANCES OF REGRESSION PARAMETERS UNDER DIFFERENT 
TRANSFORMATIONS USING ONE EXPLANATORY VARIABLE 














Generating properties of Values of 5?/s? for Variance of 
Explanatory E ’ Correlation Regression 
vasteble Error term rror term Residuals esefliciens enctieteas 
A A 0.31 0.49 0.067 0.927 
B B 0.45 0.69 0.049 0.755 
Cc Cc 1.49 1.56 0.005 0.175 
D D 1.98 2.00 0.0056 0.111 
E E 3.00 3.01 0.008 0.127 




















The decline in the variances of both the correlation coefficient and 
the regression coefficient as the error term becomes random is very 
marked. In the case of one explanatory variable the variance of the 
correlation coefficient when the error term is of form A is approxi- 
mately 11 times the variance when the error term is random, while 
the ratio of the corresponding variances of the regression coefficient is 
approximately 9 to 1. As we introduce more determining variables into 
the explanation, we can see from Table V that the variances of the 
regression coefficients decrease until in the limiting case all the varia- 
tion in the variable to be determined is explained and there is a com- 
plete set. This limiting case is of course very rarely approached in 
practice and if we consider the set B, where for three explanatory vari- 
ables the mean multiple correlation coefficient is as high as 0.97 (see 
Table II, equation 6), we find the variances of the regression co- 
efficients are 0.22 and 0.16 for by: and bi3 respectively, which from Table 
V can be seen to be three times the variances of the regression co- 
efficients calculated in the random transformation even though the 
mean multiple correlation coefficient in this form is only 0.93. 


TABLE V 
VARIANCES OF REGRESSION PARAMETERS UNDER DIFFERENT 
TRANSFORMATIONS USING TWO EXPLANATORY VARIABLES 

















Generating properties Values of 52/8? Variance of 
° Regression 
7 Multiple : 
Explanatory Error Error ae oeniins, coefficients 
variable term terms : . 
coefficient { 
b 2 | bis 
B B 0.31 1.06 0.012 0.34 0.48 
D D 2.14 2.15 0.0004 0.08 0.05 
E E 3.05 | 2.93 0.001 0.09 0.10 
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In Table IV we can see that fluctuations in the variances of the 
regression parameters are very small for reasonably large movements 
of 6?/s? around the random value, given by the results for C, D and E. 
The true values of the autocorrelation coefficients of the error terms 
vary from 7;=0.3 to r,:=—0.5 in these cases. This relative stability 
of the variances indicates that a transformation which makes the 
error term approximately random will have regained most of the 
improvement in the efficiency possible. Similar results would also 
appear to be true for the case of two explanatory variables. 

In our model there is no real trend, yet the introduction of a linear 
trend to sets A and B improves their explanation and reduces the 
variance of the regression coefficients. This would seem to be due to 
the fact already considered that the trend factor reduces the amount 
of autocorrelation in the residuals and can be regarded as one method 
of transforming the error term. In these circumstances the introduction 
of a polynomial trend may be a useful device in obtaining more 
accurate results, but it is difficult to attach an economic meaning to 
the coefficients of time. 

In order to obtain some idea of the accuracy of estimation of re- 
gression parameters under other possible types of relationships and to 
illustrate once more the importance of having the error term random, 
we constructed from the series already calculated two sets of relation- 
ships in which the autoregressive structure of the explanatory vari- 
able and the error term were different. The form of the relationship 
was— 


(4.1) Xi = k + bioX2 + v 


where X>2 was of form A in both sets and v adopted first form B and 
second form D. The true values of the constants were k=0 and 
bi2=2 while the error term was taken from our previous sets with 
v=X;+u. The first differences of each set were calculated and then a 
further correction was made to randomize the explanatory variable. 
This latter process produced error terms generated by the following 
formulae— 
F. Ser) = €en3 — 0.3: 


(4.2) 
G. Ci = (€e41 = €z) — 0.3(e; — €¢-1) 


where the ¢’s denote a series of random disturbances. The results of 
the calculations are set out in Table VI and the values of 6*/s? provide 
additional points for Figure V. In each set it can be seen that a consid- 
erable gain is to be obtained in the efficiency of the estimates of the 
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correlation coefficients and regression coefficients when the error term 
is random. If error terms are really random as postulated by many 
economists, there is nothing to be gained from making any transforma- 
tion, even though the original series possess high positive autocorrela- 
tion. It can also be seen from the mean values of the regression coef- 
ficients of Tables II and VI that the least squares estimates are not 
biassed when the error term is autocorrelated even though they are not 
the best estimates. 


Tests of Significance. It is well recognized that the ordinary test of 
significance for the null hypothesis can be applied to the correlation 
between two series provided one of them is random.’ This can be seen 
to be equivalent to making the error term random in the special case 
of a zero regression coefficient. To apply confidence limits it is neces- 
sary that the dependent variable is distributed normally and randomly 
around a linear funczion of the explanatory variable. This is true 
even if the explanatory variable is not random.® If economic time series 
possess the properties which we are suggesting, then the transformation 
to make the error terms random will put them in a form in which it 
will be possible to apply confidence limits and test the significance of 
regression parameters in the ordinary way. 

Prediction. Prediction is one of the primary reasons for undertaking 
statistical analysis. In Table VII we present some material derived from 
our constructed relations which emphasizes the huge improvement that 
it is possible to make if one is dealing with a formulation involving 
error terms which are a first summation of random elements. This 
table also indicates how misleading the variance of the residuals may 
be in such a case. 

The fact that the items in column IV are smaller than those in 
column V is, of course, to be expected, since the regression parameters 
have been chosen to minimize the mean square of the residuals and the 
true errors are those obtained by use of the true values of the regression 
parameters. In the cases of random error terms, rows 2 and 5, this 
downward bias is small and could, if desired, be easily compensated 
by taking account of the number of parameters fitted. In the cases 
of error terms which are the first summation of random numbers, the 
downward bias is exceedingly large for series of this length and should 





7 See M. 8S. Bartlett, “Some Aspects of the Time-Correlation Problem in regard to Tests of Sig- 
nificance,” Journal of the Royal Statistical Society, Vol. 98, 1935, pp. 536-543. 

8 See R. A. Fisher, “The Goodness of Fit of Regression Formulae and the Distribution of Regres- 
sion Coefficients,” Journal of the Royal Statistical Society, Vol. 85, 1922, pp. 597-612, and H. Cramer, 
“Mathematical Methods cf Statistics,” op. cit., pp. 548-555. 
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emphasize the caution needed in interpreting standard errors of es- 
timate if the error terms are likely to be highly positively autocorre- 
lated. Column VI gives the variance of the errors of prediction one 
item beyond the parts of the series utilized for estimating the regres- 
sion parameters. That is, each of the series involved in each set of twenty 
equations was extended one item and the dependent variable then 
predicted with a knowledge of the regression coefficients previously 
calculated. Column VI again illustrates in a rather simple way how mis- 
leading the variance of residuals may be when the error terms are auto- 
correlated, as in rows 1, 3 and 4. It should of course be realized that the 
much smaller variances obtained in rows 2 and 5 are due both to the 
fact that better estimates of the regression parameters have been 
obtained and used in these cases and also that the prediction formula 
makes use of the fact that the errors involved in rows 1, 3 and 4 are 
the first summation of random numbers. Thus, whereas in row 1 the 
estimating formula was 


(4.3) Xin = a + os 


in row 2, the estimating formula was 
(4.4) Xinga = Q1° + di2"(Xe,n41 — Xen) + Xin. 


The errors involved in the prediction formula (4.4) are therefore ran- 
dom in time whereas those in (4.3) are first summations of random 
terms. 


TABLE VII 


A COMPARISON OF THE VARIANCES OF RESIDUALS, TRUE ERRORS AND 
PREDICTIONS OBTAINED FROM SEVERAL TRANSFORMATIONS OF 
THE CONSTRUCTED RELATIONS 

















Generating properties of Variance of 
Number of Mean Mean errors of 
explanatory | variance variance | predictions 
No —w — variables | of residuals of true one item 
errors beyond 
sample 
I II III IV Vv VI 
1 B B 1 5142 7725 7479 
2 D D 1 1375 1466 933 
3 B B 2+time 784 4386 7127 
4 B B 2 1690 4386 3991 
5 D D 2 634 749 774 
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A TENTATIVE METHOD OF PROCEDURE 


Having recognized that the error terms implicit in many current 
formulations of economic relations are highly positively autocorre- 
lated, and also having recognized the importance of carrying out 
estimation and prediction by means of relations involving random 
error terms, how shall we proceed when faced with a practical situa- 
tion? One way of evading this problem would be to change some of the 
variables, add additional variables, or modify the form of the relation 
until a relationship involving what appear to be random error terms 
is found. However, while this may possibly be a satisfactory way out in 
some cases, it obviously does not help much if by some means or other 
one has arrived at what is believed to be the most reasonable choice 
of variables and form of relation. This choice of variables and form 
of relation usually does not involve any specification of whether or not 
the errors are autocorrelated and what is required is the best method 
of estimating the parameters and various standard errors of the 
chosen relation, and not some other relation. In this situation the 
objective, of course, is to make an autoregressive transformation of the 
dependent and independent variables such that the error term becomes 
random. If the autoregressive properties of the error term were known, 
then it would simply be a matter of making the indicated autoregres- 
sive transformation as illustrated in section 2. The real problem arises 
when the autoregressive properties of the error term are not known 
but must be estimated. Except for the fact, which our experiments 
demonstrate, that nearly optimum results can be achieved if the error 
term is only a rough approximation to a random series, solution of the 
problem would seem rather hopeless for series of only twenty items. 

One fairly obvious procedure, which we are inclined to rule out be- 
cause of the large biases demonstrated in section 3, would be the 
following iterative process. First estimate the desired regression co- 
efficients by ordinary least squares and obtain the resulting series of 
residuals. Then estimate from those residuals by least squares the auto- 
regressive parameters of a one or two lag difference equation. Use these 
autoregressive parameters to make an autoregressive transformation 
of the observed series aimed at randomizing the error term, and re- 
estimate the desired regression coefficients. Put these revised estimates 
back in the original equation, obtain the resulting series of residuals and 
estimate their autoregressive parameters. Use these to make a new 
autoregressive transformation of the original series and so on until 
estimates of the desired regression coefficients are obtained which 
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are consistent with estimates of the autoregressive parameters of 
the residuals in the sense that no further adjustments are necessary. 
Since it is only necessary to make the error term approximately random 
it is unlikely that much would be gained by carrying the above process 
more than one or two rounds. The real difficulty with this procedure 
is that the series of residuals wili, as shown in section 3, be strongly 
biassed towards randomness and therefore the autoregressive trans- 
formation based in the above way on the residuals may not in fact go 
far enough in randomizing the error term. 

An alternative procedure which appears more promising to us is 
that of selecting an autoregressive transformation of the series involved 
such that the autocorrelations of the series of residuals are approxi- 
mately equal to the expected values of autocorrelations of random 
series of the same length. We have not worked out an efficient pro- 
cedure for doing this; but, if one is willing to approximate the auto- 
regressive properties of the error term by a one or even two lag linear 
difference equation, it is fairly easy after one or two trials to choose an 
autoregressive transformation which will result in residuals that are 
sufficiently random. Furthermore, if our evidence that many error 
terms appear to be approximately first summations of random term is 
accepted, then the obvious procedure is to work with first differences of 
the series used. Thus, given a relation between ordinary economic 
variables 


(5.1) Xizp = G1 + DywX2, + bisX s¢ 


we suggest as a first approximation estimation and prediction in the 
form 


(5.2) (Xie — X1,t-1) — Dio(Xoe = X2,1-1) + bis(Xs¢ = X3,t-1)- 


If (5.1) had contained a linear trend then (5.2) would have contained a 
constant term. The residuals from (5.2) can be obtained and tested 
for randomness. 

If we prove to be right about the nature of most error terms in 
current formulations of economic relations, then the residuals of the 
first difference transformation will turn out to be sufficiently random 
and no further steps will be necessary. If the residuals in this form do 
not turn out to be sufficiently random, then a new transformation can 
be devised on the basis of their autocorrelations. The main advan- 
tages of this procedure are, first, that in many cases it will result im- 
mediately in the correct transformation and, secondly, that when it 
does not it will usually result in residuals that are not highly positively 
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autocorrelated and thereby reduce the amount of bias towards random- 
ness which is present in this case. This will be a help in devising 
successive autoregressive transformations. 

On the basis of this study Richard Stone’ has recalculated a number 
of demand studies for the United Kingdom 1920-38. The general 
results will be published by Stone, but he has kindly made available 
to us the material presented in Table VIII. We present this material 
as further evidence that in many cases the use of first differences does 
result in essentially random series. It also seems reassuring, in so far as 
Stone’s work is concerned, and rather remarkable, that in most cases 
the multiple correlations for the relations in first difference form re- 


mained very high. 
TABLE VIII 


VALUES OF 6?/s? FOR A NUMBER OF DEMAND STUDIES FOR THE 
UNITED KINGDOM 1920-38 
































Values of 5?/s? for Adjusted multiple 
a residuals correlation coefficient 
Commodity arameters 
P sa Original First Original First 
data differences data differences 
Beer 3 1.28 | 1.86 0.989 0.962 
4 1.13 2.01 0.989 0.977 
4+time 1.23 0.993 
Spirits 3+time 1.26 2.63 0.992 0.875 
Telegrams 3 1.24 1.61 0.985 0.967 
4+time i. 1.65 0.987 0.966 
Imported wine 4 1.49 1.84 0.893 0.754 
Communication services 3 +time 0.71 2.05 0.996 0.834 
4+time 0.70 2.11 0.996 0.822 
Lard 3+time 0.90 2.06 0.838 0.864 
Margarine 4 1.26 1.80 0.959 0.748 
4+time 2.02 0.969 
} 5+time 2.31 2.31 0.976 0.756 
Mean value of 5%/s? | 1.28 | 1.99 | | 





APPENDIX TO SECTION II 


It is of interest to compare the simple solution presented in section 
II with the general solution given by Aitken [30]. We shall not repeat 





® These studies were originally given in his paper on “Analysis of Demand,” op. c#t., but the recal- 
culations were made on the basis of revised estimates of the data. 
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his elegant and rigorous proofs but shall merely illustrate his approach 
and deduce the special case where the error series follows a simple 
Markoff scheme. For this it is necessary to follow his generality of no- 
tation and employ matrices and vectors, using P’ and y’ to denote the 
matrix or vector obtained by transposing P or y and P as the 
inverse matrix of P. 

Consider first the simple case of least squares with non-autocorre- 
lated errors. Let the approximate representation of the column vector 
of data 


(6.1) Y = (YiY2-** Yn} 
by the column vector 
(6.2) Z= { 2122 72° Saf 


be linear in terms of a set of (k +1) prescribed functions 
(6.3) 1, Vit, Tat, °° * » Lke (t = 1, eee n). 


Let P denote the matrix of these functional values so that the ith 
row of P is the row vector 


(6.4) [1, V1i, Ta °° * Tri |. 


Then P is of order nX(k+1) and with the restriction of linear inde- 
pendence over the v values ri, +--+, Zin, it is of rank (kK+1). Let a 
denote a column vector of (k+1) coefficients 


(6.5) a = {aoaiaz - - - ax}. 
Then the set of values z; is the vector 
(6.6) 2 = Pa. 


If the data y are independent then the principle of least squares mini- 
mizes the sum of the squared residuals. This is the vector product 


(6.7) s°? = (y — Pa)'(y — Pa) 


and for the minimal conditions ds?/da =0 we obtain the set of normal 
equations 


(6.8) P'Pa = P'y. 


Having established this general result for least squares, Aitken ex- 
tends the argument to the case of autocorrelated errors. If the set of 
errors be arranged according to their variances and covariances by the 
elements of a symmetric matrix U of order n Xn, then the least squares 
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estimates are obtained by minimizing 
(6.9) (y — Pa)’/U-"(y — Pa). 


Differentiating in the manner above, we obtain the set of more general 


normal equations 
(6.10) P’U-'Pa = P’U-'y. 


Let us now apply these general results to a simple specific example. 
Suppose we have a linear relation 


(6.11) Y; = ao + aX, + us (t =]... - n) 
where u; is defined by the simple Markoff process 
(6.12) Ut = Buy et (g = 1) 


where 8 is a known constant bt e, a random disturbance. Our vari- 
ance, covariance matrix of error may be defined by the symmetric 
matrix of order x Xn where we have assumed unit variance of «, for 
simplicity, although the final result would not be altered if we did not. 


1p f - B 
6B 1 8B tie 


(6.13) B? Bg 1 iti 











a p"- 1 pr m 1 J 


from which we obtain the symmetric inverse matrix 








1 —B 0 0 ) 

—-6B 1+ 8? -8 0 

0 _ 1 + £B? 0 
(6.14) U-! 6 6 

1+f -£ 

0 0 0 -—8 1 | 
The matrix P is of order n X2 where the 7th row is 
(6.15) [1 X;] 


while the vector of coefficients becomes the column vector 


[I 


(6.16) 
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Applying these components to the generai normal equations (6.10) 
and expanding we obtain the estimate of a; as 


>> ry: — B b TYi1 — B>. Liye + B?>> Tt-1Y t-1 
2 2 3 


1 





n 


ps zr — 28 > Che + p>) X17 
2 3 


1 
where 2, y; are in terms of deviations from their means which are 
given by 


(6.18) z= ; ( oe E> X,). 


n — B(n — 2) 1 2 





These are completely general results for error terms of the simple 
type considered and do not involve any assumptions about the dis- 
tribution of the random disturbances e;. If e, are normally distributed 
then we have a maximum likelihood solution. _ 

Comparing the estimate (6.17) with that obtained by our modified 
transformation procedure of section II we have from (6.11) and (6.12) 


(6.19) Y, — BYin = ao + a(x, = BX t-1) + & 


where the least squares estimate of a; is 


=, ry: — B 7 LiYi-1 — p>. Leyte + p>) e-1Y t-1 
2 2 2 2 





(6.20) a= . - - 
D te? — 28D) rete + BD) 1-1? 
2 2 2 
where the means are calculated by 


1 


n—-1 


1 


n—1l1 








(6.21) > X; and 
2 


D Xe. 
2 


If we represent the numerator and denominator of (6.20) by A and B 
respectively we obtain 


(6.22) a=— 


so that the estimator given by (6.17) is 


A 1 — p? 
(6.23) ds + aryl 6?) 





~ B+ a%(1 — 6) 
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The reason for this difference is that @, ignores the possibility of 
making use of the first error term wu, and estimates the regression co- 
efficients using only (n—1) transformed terms. The sum of squares 
of the (n—1) terms is 


(6.24) > €,? = , (we = Burs). 


The first term may be introduced by using the fact that the expected 
value of «? given w is 


(6.25) E(e?) = (1 — B?)u;? 
so that 


(6.26) s?= Doe? = D> (us — Burr)? + (1 — 6%) m?. 
1 2 

If we substitute for the wu’s in terms of z and y from (6.11) and mini- 
mize in the ordinary way with respect to ap and a, we again obtain 
the solutions (6.17) and (6.18). It can be seen therefore that @ is an 
unbiased estimate of a, but by ignoring the first term a maximum of 
one degree of freedom is lost in the transformation procedure as B 
approaches zero. As 8 approaches unity the difference between & and 
@, approaches zero and when 6=1 the solutions (6.17) and (6.20) are 
identical and the obvious course is to make a first difference transfor- 
mation. 

In the case of multivariate regression the procedure of transforming 
the variables and applying ordinary least squares analysis provides a 
much simpler solution than the method indicated by (6.17). The trans- 
formation procedure also provides a simpler solution in the case 
where the autoregressive structure of the error term comprises a linear 
stochastic difference equation involving two or more lagged terms. 
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Haavelmo, op. cit., especially p. 85. 

[28] J. von Neumann, “Distribution of the Ratio of the Mean Square Successive 
Difference to the Variance,” Annals of Mathematical Statistics, Vol. 12, pp. 
367-395; B. S. Hart and J. von Neumann, “Tabulation of the Probabilities 
for the Ratio of the Mean Square Successive Differences to the Variance,” 
Annals of Mathematical Statistics, Vol. 13, pp. 207-214. 

[29] M. G. Kendall and B. Babington Smith, Tracts for Computors No. 24, Cam- 
bridge University Press, 1939. 

[30] A. C. Aitken, “On Least Squares and Linear Combinations of Observations,” 


op. cit. 





AOQL SINGLE SAMPLING PLANS FROM 
A SINGLE CHART AND TABLE* 


Dona.p J. GREB 
Chief, Quality Control Engineer, Minneapolis-Honeywell Regulator Co., 
Minneapolis, Minnesota 
AND 
Ju.tio N. BERRETTONI 
Consultant Economist and Statistician 
Minneapolis, Minnesota 


This paper presents a single chart and table from which 
AOQL (Average Outgoing Quality Limit) Single Sampling 
Plans may be determined with ease. These plans yield a close 
approximation to minimum inspection both for unknown in- 
coming quality and for known average incoming quality unless 
the variation in quality from lot to lot is extremely small. 


AOQL SINGLE SAMPLING PLANS 


HART I and Table II present a set of AOQL Single Sampling 

Plans. Their manipulation is simple. Given AOQL and lot size 
(N) locate on the chart the c-zone of their point of intersection. For 
example, if AOQL=1% and N=1000, the point of intersection on the 
chart falls between the two parallel diagonal lines of zone c=1.! This 
value of c is the acceptance number and is the ceiling in number of 
defectives that permits the acceptance of the lot when a sample is 
used. The sample size corresponding to the value of AOQL and c is 
found in the Table of Sample Sizes (Table II) and for AOQL=1% and 
c=1 the sample size is 84. The action that follows is to sample 84 from 
a lot of 1000 pieces and if one or less defective is found accept the lot 
without further inspection and if more than one defective is found reject 
the lot for complete sorting. The results to be expected are (1) there is 
an absolute guarantee that over a series of lots the average per cent de- 
fective will not exceed the selected value of AOQL, and (2) unless the 
variation of incoming quality from lot to lot is very small? the AOQL 
will be maintained with an amount of inspection that is of practical 
significance in approximating the minimum inspection which could be 
obtained if incoming quality from lot to lot were known. 





* Acknowledgment is made to Mr. P. M. Brink of Minneapolis-Honeywell Regulator Co. for in- 
dispensable assistance with the calculations and for his excellent work in drawing the graphs. 

1 If the point of intersection falls on a line, use the zone directly below. 

2 The word “small” is used here in the sense of being somewhat less than the +3e limits of normal 
variation. 
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The limitation of Chart I is that it assumes sample size is small rela- 
tive to the lot size. If this is not the case, the sample size is larger than 
need be. However, the limitation is not of a serious nature as the differ- 
ence between sample size with or without the assumption that n is 
small relative to N is not of a large order unless the lot size is very small. 

These plans are designed for use whenever there is a desire to main- 
tain an average quality over a series of lots. Thus, they may be used 
advantageously for inspections between operations, departments, sub- 
assemblies, receiving inspection, finished products, etc. 


THE BACKGROUND OF CHART I® 


The Derivation of Combinations of Sample Size and Acceptance Number 
Yielding a Selected Value of AOQL. 
The formula for average outgoing quality in terms of the hyper- 


A0Q = X( - =) (1) 
— ie 


where m is the number of defectives in a sample of size n, c is the ac- 
ceptance number and 7 is the per cent defective of a lot of size N. As- 
sume that p is less than or equal to ten per cent and that sample size 








* The following will serve as a useful list of references: Dodge, H. F. and Romig, H. G., Sampling 
Inspection Tables, John Wiley & Sons, Inc., New York, 1945. Freeman, H. A., Friedman, M., Mosteller, 
F., and Wallis, W. A., Sampling Inspection, McGraw-Hill Book Co., Inc., New York, 1948. Grant, 
E. L., Statistical Quality Control, McGraw-Hill Book Co., Inc., New York, 1946. Hoel, P. G., Introdue- 
tion to Mathematical Statistics, John Wiley & Sons, Inc., New York, 1946. Peach, Paul, An Introduction 
to Industrial Statistics and Quality Control, Edwards and Broughton Co., Raleigh, N. C., 1945. Wilks, 
8. S., Mathematical Statistics, Princeton University Press, Princeton, N. J., 1947. Working Holbrook, 
A Guide to the Utilization of the Binomial and Poisson Distributions in Industrial Quality Control, Stan- 
ford University Press, Stanford University, California, 1943. 

Churchman, C. W. and Epstein, B., Tests of Increased Severity, Journal of the American Statistical 
Association, Vol. 41, No. 236, December, 1946, pp. 567-590. Dodge, H. F., A Sampling Inspection Plan 
for Continuous Production, The Annals of Mathematical Statistics, Vol. XIV, No. 3, September, 1943, 
pp. 264-279. Wald, A. and Wolfowitz, Sampling Inspection Plans for Continuous Production Which 
Insure a Prescribed Limit on the Outgoing Quality, The Annals of Mathematical Statistics, Vol. XVI, 
No. 1, March 1945, pp. 30-49. 

Army Service Forces, Office of the Quartermaster General, Sampling for Quality Control (Super- 
visor’s Edition), December, 1945. Navy Department, General Specification for Inspection of Material, 
Appendix X, Standard Sampling Inspection Tables for Inspection by Attributes, April 1946, United 
States Government Printing Office, Washington, D. C. 

4 Wilks, S. S., op. cit., p. 223 and Hoel, P. G., op. cit., p. 224. 
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is greater than ten. Then a close approximation is established by the 
substitution of the Poisson for the hypergeometric distribution. Fur- 
ther assume that N is large relative to n and therefore also to m so that 
n/N and m/N are considered negligible. Given these assumptions, 
equation (1) reduces to® 


©. e-*9(np)™ 


A0Q = py) (2) 
m=0 m! 
Upon maximizing this equation, we have that 
avn P(N)" 
AOQL = p), ——_— (3) 
m=0 m! 
and 
a emp) 
(n)(AOQL) = np >) — (4) 
m=0 . 


where ? is the abscissa value of maximization. Let a=(n)(AOQL), 
which values® for integral variations of c from zero to twelve are pre- 
sented in Table I. Sample sizes which in combination with c yield 
selected values of AOQL are readily determined by dividing the 
(n)(AOQL) values by the given values of AOQL. Table II presents 
these sample sizes. 





Equation (2) differs from Dodge and Romig’s equation of AOQ, which is 


N N! (M—m) © Cy! Cra 
A0Q = >, pM (1 — pyV-™. 
M=—0 


(N — M)IM1 wR waa CrV 





In words, this equation states that AOQ values, as calculated by the hypergeometric formula of the ac- 
ceptance from a sample size n of c defectives pertaining to a lot of size N and M defectives, are weighted 
by the expected binomial frequencies of M defe-tives, lot size N and average incoming quality equal to 
p. Thus their assumption is that a lot is a sample from a stream of statistically controlled product vary- 
ing according to the binomial distribution. The equation can easily be reduced to the summary form of 


n! 


AOQ = p(l — n/N) >, ————»(1 — p)"™™ 
mao (n — m)!m! 


and by substituting the Poisson for the Binomial the equation given for AOQ in their book is obtained 
(op. cit., p. 48, equation 15). It is to be noted that given our assumption that n/N is small our definition 
does not differ from theirs and also that with this assumption AQQ is made independent of the binomial 
form of distribution and of N. 

The writers wish to thank Mr. Dodge and Mr. Romig for their kindness in conveying to us by way 
of correspondence the underlying aspects of their definition of AOQ. 

* Poisson summation tables of Grant or Molina may be used to calculate n(AOQL) values. Grant, 





E. L., op. cit., Table G, pp. 542-546. Molina, E. C., Poisson’s Exp tial Bi ial Limit, D. Van 
Nostrand, New York, 1947, Table II. 
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TABLE I 
VALUES OF n(AOQL) =a 
np* Pa n(AOQL)** 
0 1.000 -367879 .3679 
1 1.618 -519136 . 8400 
2 2.270 -604010 1.3711 
3 2.945 -659552 1.9424 
4 3.640 -698775 2.5435 
5 4.349 - 728499 3.1682 
6 5.071 -751730 3.8120 
} 7 5.804 -770495 4.4720 
8 6.546 - 786079 5.1457 
9 7.297 - 799148 5.8314 
10 8.055 -810388 6.5277 
| 12 9.590 -828740 7.9476 
* Accurate to .0005. 
| ** Accurate to .00005. 
I 
TABLE II 


TABLE OF SAMPLE SIZES 








AOQL—Per cent 





e| .10} .25| .50| .75 {1.0/1.5 |2.0/2.5/3.0/3.5/4.0/4.5)5.0) 6 | 7 8 | 9 |10 





0| 367) 147) 73 | 49] 36) 24) 18) 14) 12) 10) 9 8 7 6 5) 4) 4] 8 
1 | 840) 336) 168) 112 84] 56] 42) 33) 27) 24) 21) 18) 16) 13) 11) 10; 9] 8 
2 | 1371) 548) 274 | 182 | 137| 91) 68) 54) 45) 39) 34) 30) 27) 22) 19) 17 | 15 








3 | 1942) 776) 388 | 258 | 194) 129) 97) 77) 64] 55) 48) 43) 38) 32) 27) 24 | 21 | 19 
4 | 2543| 1017; 508 | 339 | 254) 169) 127) 101) 84| 72) 63] 56) 50) 42) 36) 31 | 28 | 25 








5 1267) 633 | 422 | 316) 211] 158) 126] 105; 90) 79) 70) 63] 52) 45) 39 | 35 | 31 
6 762 | 508 | 381) 254) 190) 152) 127) 108} 95) 84) 76) 63) 54) 47 | 42 | 38 
7 596 | 447; 298) 223) 178) 149] 127] 111) 99} 89) 74) 63) 55 | 49 | 44 
8 514) 343) 257) 205) 171) 147; 128) 114) 102) 85) 73) 64 | 57 | 51 
9 388} 291] 233) 194] 166] 145) 129] 116} 97| 83) 72 | 64 | 58 
10 261| 217) 186] 163] 145) 130) 108} 93) 81 | 72 | 65 
12 198} 176) 158) 132) 113] 99 | 88 | 79 
























































c-p and c-N Zones of Minimum Average Inspection and the Construction 
of a Minimum Inspection Single Sampling Chart 
The formula for average nnmber of pieces inspected per lot (I) is’ 


c a (5) 


m=0 m! 





7 The subscripts of n are e and AOQL; the omission of AOQL is a matter of convenience in notation. 
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Assume that N and AOQL are constant. Equation (5) then algebrai- 
cally characterizes curves illustrated by those in Chart JI for the value 
of c from zero to four, given AOQL=2% and N =1000.° This chart is 
significant in three respects. First, it shows that as p increases in- 
spection curves intersect and form zones of minimum average inspec- 
tion for certain ranges of p. These are the popular c-p zones introduced 
by H. F. Dodge and H. G. Romig." Second, it is instructive in that as 
p varies from zero to one, the c factor giving minimum inspection varies 
parabolically with a maximum of three. That is, as p varies from zero 
to one, the variation of c forming minimum inspection zones is 0, 1, 2, 
3, 2, 1, and 0. This signifies that only sampling plans with these values 
of c yield minimum inspection. Sampling plans using c equal to or 
greater than four do not involve minimum inspection with any value 
of p. Third, it points out that the ratio p/ AOQL =1 is contained within 
the zone formed by the maximum of the c values yielding minimum 
inspection. Chart III presents only the segments of the inspection 
curves of Chart II which form zones of minimum average inspection. 
This curve of Chart III is designated as the c-p minimum inspection 
curve. If it is assumed that p and AOQL are constant, then equation 
(5) represents inspection curves forming c-N zones of minimum inspec- 
tion which are illustrated in Chart IV. In this case, the designation of 
c-N is attached to the minimum inspection curve." 
The equation 


ne + (V —n)(1 a 


m=0 m! 





é cow) (6) 


= Nex1 + (N — M41) (1 = } 


m=0 m! 


gives values of p and N demarcating the boundaries of c-p and c-N 
zones of minimum inspection respectively. The following equation, 
derived from equation (6) 





8 Fixing the value of AOQL determines the values of ne. 

® When only c is specified, the corresponding value of n; is to be understood. 

10 Op. cit. 

11 c-AOQL zones also may be derived by holding constant p and N. However, these zones have 
only theoretical value and are not discussed here. The zones of minimum average inspection may be 
succinctly analyzed in terms of differences in sample sizes and amount of detailing as expressed by the 
equation 


Ik-c — Te = (nk-e — Me) — [(N — )Pre — (N — nee)Prec] kk = e,c +1e+2°++ wk #2 








68 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1949 


p E ett e-anp/A00L(q,. p/AOQL)™ 
c+1 

















AOQL ont m! 
¢ e-aep/AOQL(g y/AOQL)™ 
os i € (a._p/AOQL) | 
a m! 
‘ities (7) 
ctl g—aetip/A0QL(q., 9 /AOQL)™ 
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gives an alternate and easier method of determining the c values 
yielding minimum inspection. Plotting Np against p/AOQL (Chart 
V), minimum inspection zones are described for Np and/or p/AOQL. 
Therefore, if N and AOQL are constant, the c number of the c-p zone 
of minimum inspection is read directly from Chart V. For instance, 
if AOQL=2%, N =1000, and p=1%, then Np=10 and p/AOQL=0.5 
and the coordinate falls in the zone of c=2, which result is the same 
as that of Chart IIT (or II). If p and AOQL are given, it follows that 
c-N zones of minimum inspection are obtained, so that if p and AOQL 
are each equal to 2% and N=400, then Np=8, p/AOQL=1 and the 
chart reads c=1. The same result is found in Chart IV. Therefore, 
Chart V presents a minimum inspection AOQL single sampling chart 
for known values of p and N. 

Chart V is of further interest in that it summarizes the characteristics 
of c-p and c-N zones of minimum inspection. In Chart VI the zones 
through which the dashed curves pass are the p/AOQL zones of mini- 
mum inspection for the designated values of N and AOQL and these 
zones are in direct proportion to c-p zones. It is readily seen, therefore, 
that c-p zones vary parabolically and that p/AOQL=1 is always con- 
tained in the zone of the maximum c of minimum inspection. Further- 
more, it is noted that the number of c zones and the maximum c vary 
directly with N. The zones through which the vertical lines of p/AOQL 
pass are in direct proportion to c-N zones. If p/AOQL is equal to or less 
than one all zones in Chart V (except that forming the boundary be- 
tween c=1 and c=0) converge to a point at Np=infinity and p/AOQL 
=0. Thus, in this region as N approaches infinity, c-N zones also ap- 
proach infinity. If p/AOQL is greater than one, all zones become verti- 
cally asymptotic for definite ranges of p/AOQL values so that as N in- 
creases c-N reaches a definite maximum value. For example, if p=1.8%, 
AOQL=1% then every value of N higher than 724 will have c=1 for 
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minimum inspection since the zone boundaries are asymptotic at values 
of p/AOQL equal to approximately 1.79 and 2.24+-. 


AOQL Single Sampling Plans 

The selection of c yielding minimum inspection depends on a knowl- 
edge of p and therefore Np and p/AOQL. In practice these values are 
always unknown. However, it is known from Chart V that any arbi- 
trary selection of a value of p/AOQL will give a value of c that forms a 
zone of mirimum inspection. It is also known that the maximum c 
number of minimum inspection can be determined from JN if the value 
of p/AOQL is assumed to be 1. This value of ¢ will lead to minimum 
inspection for a certain range of p and deviate from minimum for 
other values of p. Thus, if this value of c is used as a substitute for those 
based on known values of p, inspection over the entire range of p would 
approximate minimum. This is shown in Chart VII, which compares 
the c=3 inspection curve (c=3 obtained by assuming p/AOQL=1 
and N=1,000) and the c-p minimum curve of AOQL=2% and N= 
1000.'? Similarly, it is known that the use of the ratio p/AOQL equal 
to zero gives the smallest c of minimum inspection, namely zero, and 
that minimum inspection is always obtained for values of p beyond 
2.24- AOQL." If this value of c is used as a substitute for those based on 


‘known p, a second approximation to minimum inspection is obtained 


for the overall variability of p. Chart VII gives a visual presentation of 
the approximation when c=0 is used. However, because of the rapid 
convergence of inspection curves to the value of N as p varies beyond 
2.24- AOQL, or stated differently, because of the large amount of de- 
tailing for any value of c beyond 2.24-AOQL—never less than 56%— 
it is of little significance from the economic point of view whether the 
minimum c=0 or some other value is used. The importance of ap- 
proximation lies in the region of p less than 2.24- AOQL. In studying in- 
spection curves of approximation within this region, it has been found 
that the ratio p/AOQL=0.5 gives a better approximation to minimum 
inspection than any other value of p/AOQL." For the entire range of p, 
with practical significance, the ratio of p/AOQL=0.5 leads to the 
selection of an inspection curve which best approximates the c-p mini- 
mum inspection curve. 





12 After the ratio p/AOQL has been assigned, the given value of AOQL (e.g. 2 per cent) is used only 
to determine the sample sizes. 

18 See Chart V. 

4 To obtain the same inspection curves of approximation given by p/AOQL =0.5, p/AOQL must 
necessarily vary in the region of p/AOQL >1 because of the asymptotic nature of the c zones in this 
region. 
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The selection of c-values based on the assumption of p/AOQL=0.5 
is made easier by constructing Chart I. This chart eliminates the re- 
quirement of Chart II of calculating Np. Since from equation (7) 


1 etl g—actip/AOQL(q.. .p/AOQL)™ 

















nat "= m! 
e —acp/AOQL AO i, ™m 
“aS! (a.p/AOQL) ] 
m=0 m! 
N = _ (8) 
r» 
m=0 m! 





ec. e-aer/A00L(g.-p/AOQL)™ 
Re 


m=0 m! 


and since N, p=AOQL/2, c, (n-)(AOQL)=a., and (n-41)(AOQL) = 
Qc41 are given, we have that 

















A 
N= (9) 
AOQL 
log N = log A — log AOQL (10) 
where 
c+1 e~haeti(2q,.1)™ c e~i2-(2a.)™ 
Ac+1 — —_— — Ge ~ aa 
A emennneee ——— (11) 
= ehecti(Fae41)™ >> e—te-(3a.)™ 
m=0 m! m=0 m! 


and is a constant. Thus, in logarithms the boundary lines between lot 
sizes and AOQL are parallel, straight and negatively sloped. 


Comparison of Charts I and V Given Knowledge of Average Incoming 
Quality (p) 

Chart V yields minimum inspection if the value of p for an incoming 
lot is known. Therefore, Chart V is useful in practice if p is known and 
if the variation of p is almost completely within a single zone of mini- 
mum inspection. In general, this will require a variation considerably 
less than the normal plus or minus three sigma limits of variability. 
Consequently, unless the variation of p is very small, Chart I gives a 
better approximation to minimum inspection because of the parabolic 
behavior of the c-p zones. 
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ON MEASURING LANGUAGES 


Stuart C. Dopp, Px.D. 
University of Washington, Seatile 


This paper proposes ten criteria by which the suitability of 
any language for use as an international language might be 
measured. These criteria fall into two classes. The first three 
are criteria of familiarity—that is, they measure the extent to 
which a candidate language is already familiar to the people 
who would have to learn it. The remaining seven are criteria 
of excellence, and are intended to rate languages according to 
such properties as their freedom from local idioms, from ex- 
ceptions to the rules of grammar, from inflections, and so on. 
Such criteria have three purposes. First, they would rank the 
candidate languages by, familiarity and excellence. Second, 
they would diagnose weaknesses in each candidate: from this 
diagnosis a living language could be simplified towards the 
ideal regularity of an artificial language, while preserving 
more of familiarity to the world’s population than an artificial 
language possesses. Finally, they would indicate any progress 
that the world may be making from decade to decade towards 
achieving a single language. 


CRITERIA FOR A “BEST” LANGUAGE 


HE PROBLEM OF an international auxiliary language has become in 

part a problem of selecting it from among the three hundred candi- 
dates which have been proposed in the last seventy years. To select the 
best candidate requires prior agreement on what is “best.” What are 
the criteria which specify the “best”? This paper proposes ten criteria. 
It further proposes ten indices which measure the degree to which each 
criterion is satisfied by a given candidate language. A weighted sum of 
these indices can then rank the candidates into a relative order. 

The criteria for the best world language may be put into two classes 
—the practical and the ideal. These are also called the natural vs the 
schematic types when applied to artificial languages. They specify, 
respectively, what is most likely to be adopted by the world and what is 
intrinsically the most excellent as a language. These will be referred to 
here as the criteria of familiarity and the criteria of excellence. 

For it should be obvious that the most practical proposal is one which 
involves the least change or the least amount of new learning for the 
world’s population. Thus the candidate language which has the largest 
proportion of elements which are already familiar to the maximum 
number of people will encounter least resistance. We can measure the 
degree of familiarity of a candidate language and thus make a crucial 
comparison to prove which candidates are most practical. 
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THE FAMILIARITY CRITERION 


What language is most familiar to the users of any one language in 
that it has the largest percentage of its words, grammatical forms or 
other elements the same between their language and the candidate 
world language? In order to measure this, several indices, varying in 
completeness are as follows: A first index of familiarity, which may be 
labelled F,, is calculated by taking as a first step a representative sam- 
ple of the elements of the candidate language. One such sample might 
be the 1000 semantic words which occur most frequently, as deter- 
mined from a semantic word count. (A “semantic” word is defined as a 
word or phrase with a unit meaning i.e. “to look out for” meaning 
“to protect.”) 

The next step is to assign to each of these thousand “most frequent” 
words, which serve as a representative sample of the candidate lan- 
guage, a value of 1 if it is exactly the same in the national language and 
a value of $ if it is partly the same (as in having a root or an affix in 
common). These unit or half unit values are added up and, since they 
will give a thousand points at maximum, this total will serve as a per- 
centage of common vocabulary between one candidate language and 
one national language. This percentage is F\.! 

Next there will be other values of F1, one for each national language 
paired with each candidate language. That is, for one candidate lan- 
guage there will be as many F)’s as there are national languages or im- 
portant groups of national languages deserving consideration in the 
world. This number of F; indices will then be multiplied by the number 
of candidate languages i.e. the number of languages for which research 
provides these data and which are considered important enough to be 
likely candidates for a world language. It is obvious that this is an im- 
mense project of research for many scholars for many years. 

These F; indices of familiarity next must be combined into a net 
index of familiarity for each candidate language for the whole world. 
That is, the F; index for one candidate language must be weighted or 
each multiplied by the number of people speaking the national lan- 
guage corresponding to that index. This gives greater importance to 
the familiarity index of a language spoken by a 100 million people than 
to one spoken by one million people. From this democratic process of 
weighting each index by the population to which it applies, there will 
result a single net index of familiarity, which we may call F2, for each 
candidate language. These indices will rank the different candidates in 





1 F,=(ZV/N) where V =value of 1 or 4 and N =number of words in sample studied. 
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order of familiarity to the world. The indices reveal the languages at 
the top of the list which deserve further study and also reveal those at 
the bottom of the list which may be dropped from further considera- 
tion. 

A number of problems of method will have to be solved in computing 
these indices. For example, in determining the number of people speak- 
ing each national language, in order to fix upon a weighting of its F, 
index, allowance must be made for bilingual people or that fraction of 
a population which may speak more than one language. Such persons 
might be counted as 3 for each of the two languages they may speak 
thus giving them a total weighting the same as for any person speaking 
but one language. Again a further refinement in the indices might be 
to weight each word in calculating the F, index in proportion to the 
frequency of occurrence of that word. A priori, it seems probable, how- 
ever, that if the thousand most frequently used words are taken as the 
sample in calculating F,, differences in the frequency of individual 
words would not greatly change the relative size of the F; indices. 

A third problem is whether to take the total population speaking a 
given national language as a weighting factor in calculating F2 or 
whether to take some part of it which is more relevant for international 
purposes. Thus the /zterate part of each national population is probably 
a more suitable number to take as weighting coefficient. This index 
might be called F3, measuring the degree of familiarity to literate 
people. For the literate population represents those who are communi- 
cating in international affairs more adequately than the host of illiter- 
ates. To include the illiterates would give the 400 million or more 
illiterates of China or India an importance greater than all the Western 
European nations combined. To weight each nation in proportion to 
its literate population would probably be fairer basis, since part of 
learning an international language is learning its written forms. An 
index of familiarity should apply in part to the people who have already 
learned some written form of language and might have to unlearn and 
relearn an international language more than to the people who have 
learned no written form and to whom learning a new word would be 
little more difficult from learning their own national written forms. 


THE EXCELLENCE CRITERIA 


In analyzing next, the excellence of any language for international 
communication the following criteria are proposed as hypotheses. 
Some combination of criteria such as these would define what is meant 
by “the most excellent language” for international communication. 








80 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1949 


The eight proposed criteria of excellence are that: its sentences 
should be idiomless and ordered in wording; its words should be uni- 
vocal in meaning, flectionless, phonetic in spelling and unique in pro- 
nunciation; its letters should be unique in sound and shape. 


A world language should be idiomless. It should not have phrases 
which are local and peculiar to one nation and cannot be literally 
translated into other languages. A world language should have all its 
phrases so logical as to enable literal translation into any national lan- 
guage. To measure the freedom from idioms of any language, a list of 
its idioms as found in a frequency count of a representative sample of 
perhaps a million words of prose should be made. The index E,? would 
be the ratio of the million words of prose examined divided by that 
million plus the number of idioms (including repetitions) found in that 
representative body of prose. If many idioms are found, this ratio 
would be a small percentage. It would become a 100%, indicating a 
language entirely free from idioms, only when no idioms are found. 

To detect an idiom, three tests are available. The first test is the 
definition of an idiom as a phrase different in meaning from its con- 
stituent words. Another test is to try translating each phrase into each 
of some dozen other representative languages and see whether that 
phrase can be translated literally. Another test is to see if each phrase 
can be expressed in the symbols of modern Symbolic Logic. This new 
science, grown up in the last half century, develops an algebra for words 
and sentences, so that these qualitative symbols can be handled in 
equations with all the precision of mathematics. 


A world language should have the order of the words in its sentences 
obey rules without exception. The rule of course may be very rigid; or 
very flexible as in stating that certain words may occur anywhere in 
the sentence depending on the emphasis desired. Ideally, it is possible to 
conceive of a language in which all word order is determined by one 
rule such as that “modifiers follow that which they modify.” This rule 
would mean that a verb followed the subject and that the object of 
the verb followed the verb whose meaning it completes. This rule would 
mean that adjectives followed the noun they modify and adverbs fol- 
low the verbs they modify and every phrase or clause follows whatever 
it modifies. 

The index, which we may label Ez, which measures the excellence of 
a sentence in having the order of its words abide by rule could be com- 





2 E,=(N/N+M) where N =number of words in sample studied; M =number of idioms. 
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puted as a ratio of the number of words in a representative sample of 
prose, (perhaps one million words), to the number of these words when 
each is multiplied by its “frequency rank”. This “frequency rank” 
needs exp!aining. It is determined as follows: For each word in a sen- 
tence the rule that determines its position in the sentence is decided 
upon. The frequency of occurrence of each rule must then be deter- 
mined in a large sample of prose, and the rules put into a rank order so 
that the most used rule will be given a rank of 1, the next most used 
rule will be given a rank of 2, etc. Each word is then multiplied by the 
rank (whether 1, 2, 3, etc.) of the rule which governs its position in the 
sentence. This multiplication by a rank is “weighting” the word accord- 
ing to the frequency rank (in this case the frequency rank of the rule 
governing the position of the word in the sentence). By this index,’ 
if there is but one rule for all words the weighting factor is 1 and the 
index will be a 100%, as it will be a million words divided by a million 
words. If, however, a second rule appears then some of the words in the 
denominator of the index will be multiplied by two and the index will 
be less than 100%. If a third rule appears the index will bezome still 
smaller in proportion to the number of words covered by that third 
rule. Thus this index becomes smaiier in proportion as the number of 
rules becomes greater. 

A high index, therefore, measures simplicity of language in this re- 
spect and a low index measures its complexity or irregularity. The index 
also is proportional to the frequency with which each rule occurs in 
the representative sample that is studied. It should be obvious that 
having a definite word order makes sentences which are clear and 
unambiguous in meaning. If the order of words in a sentence always 
follows some rule, there is little possibility of different people inter- 
preting the sentence in different ways. Thus a rule-abiding order of 
words is an objective way of measuring and controlling the degree of 
ambiguity in the sentences of a language. This is especially so in a 
language whose words are not inflected (as explained below). 


A world language should have words which are uninflected. This cri- 
terion means that no word should ever change its form to express 
a grammatical inflection such as masculine or feminine gender, per- 
sons, or number, tense, voice or mood of a verb or degrees of an ad- 
jective. This is the trend of evolution of language. Languages grow 
up with these grammatical inflections in primitive thinking as when 





3 E,=(N/=R:) where N =number of words in sample studied; R: =frequency rank of rule govern- 
ing the position of each word. 
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man ascribed masculine and feminine gender to all nouns, simply 
because man thought of his own difference in sex as existing in every- 
thing else around him. But as people developed towards greater ma- 
turity and flexibility in language they dropped these grammatical 
inflections. Some of them are entirely unnecessary. Others are ex- 
pressed in uninflected “particles” such as the prepositions and con- 
junctions and adverbs like “to,” “as,” “and,” “or,” “not,” etc. Chinese 
has gone furthest in developing a completely uninflected language of 
root words which can be flexibly combined in different orders to make a 
great variety of meanings. 

The flexibility of uninflected words can be compared to the flexibi- 
lity of the alphabet where the clumsy symbols for whole syllables were 
replaced by a few letter symbols for elemental sounds. These letters 
can be flexibly combined to make any word in any world’s language. 
Somewhat similarly, root words and particles yield more flexible 
sentences with a greater range of possible meanings than inflected 
words can do. 

To measure the degree to which a language has progressed towards 
the ideal of complete absence of inflections, an index, which may be 
called E;, may be calculated from the same representative body of 
prose of perhaps a million words which may be used for calculating 
most of these indices discussed in this paper. The formula for the index 
of freedom from inflections is the ratio of one million words divided by 
those million words each weighted by its “frequency rank of inflec- 
tions”. This “frequency rank of inflection” is determined in a way simi- 
lar to the frequency rank of rules in the preceding index, Ex. To get it, 
the number of times each inflection occurs in the million words is 
counted, and the frequency of the inflections with one grammatical 
meaning are given the ranks of 1, 2, 3, etc. Each word is then multiplied 
by one if it is uninflected, by 2 if its first inflection is the most frequently 
occurring one, by 3 if its first inflection is the next most frequently oc- 
curring one, by a weight of 4 if its first inflection is the next most fre- 
quently occurring one, etc. If the word has more than one inflection, 
it will be multiplied by more than one such rank. By this index,‘ a 
language will be a 100 per cent flectionless only when it uses root words 
and particles, only. It will be less than a 100% flectionless in propor- 
tion as: 


a. It has many words which are inflected 





* E,=(N/ZR;) where N =nuraber of words in sample studied, and R;=frequency rank of each 
inflection of each word. 
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b. The inflected words are frequent in occurrence, and 
c. There is more than one inflection to express one grammatical 
meaning. 


Thus a language which has four conjugations for its verbs instead of 
one conjugation will have a larger weighting in the denominator of the 
index and, therefore, a lower index of excellence in respect to being 
flectionless. 


A world language should be phonetic in spelling. This criterion of ex- 
cellence that every word should be spelled exactly as it is pronounced 
implies the criterion mentioned below that every letter should represent 
only one sound. When the words of a language are spelled as they 
are pronounced, learning to read that language becomes very sim- 
ple. If there is much literature and reading matter in one’s environ- 
ment, a child will learn to read without schooling as automatically 
as he learns to speak by merely being surrounded by people using the 
written language and by his wanting to know what others are writing 
and to write things himself. A phonetic spelling is perhaps the greatest 
aid to make the population a 100% literate. All languages which use 
letters were phonetically spelled at one time of course, but in the case 
of many languages the spelling of a previous century has become stand- 
ardized while the pronunciation has changed. Another source of un- 
phonetic spelling, however, is that there are more sounds in a language 
than letters, so that some letters will be used to mean more than one 
sound. Thus English uses 40 sounds, but has only 26 letters in its 
alphabet with a result that its irregularity of spelling is greatly in- 
creased. 

T'o measure the degree to which a language is phonetic in spelling 
an index of this criterion of excellence, which we may call E;, may be 
defined by a ratio calculated from a large sample of perhaps 100 thou- 
sand letters as they occur in the representative sample of prose re- 
ferred to above. The index might be a 100 thousand letters divided by 
the number of those letters when each one is multiplied by its “fre- 
quency rank of pronunciation.” This frequency rank of pronunciation 
is again calculated similarly to the frequency rank of rules in Ez or 
frequency rank of inflections in E3. To calculate it the frequency with 
which each pronunciation of each letter recurs must be counted. Then 
for any one letter, its most frequent pronunciation is given a rank of 1. 
Its next most frequent pronunciation is given a rank of 2 and so on. 
Each letter in the denominator of the ratio is multiplied by its rank and 
these products are added to make the denominator of Ey. By this index, 
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a language will be 100% phonetic only when each letter has one pronun- 
ciation and when every word is spelled in a single and phonetic way. 
The phonetic index E,,> of a language will decrease in proportion as 
its letters, as they occur in words with current spelling, have more than 
one pronunciation. 


A world language should have words which are univocal in meaning. This 
criterion of excellence means that every word should ideally have only 
one meaning and every meaning should have only one word to symbol- 
ize it. There should be no words with multiple meanings nor should 
there be any synonyms which mean exactly the same. (Synonyms with 
slightly different meaning are desirable to express shades of differences 
in meaning and to make a language rich, but words between which 
no differences in meaning can be detected are merely confusing.) This 
is a fundamental principle of symbolism—that each symbol should 
represent one and only one “referent” or meaning. Obviously, our living 
languages as they have grown up in folk usage have acquired multiple 
meanings for many of their words. Only artificial languages such as 
Esperantv approach the ideal of “one word, one meaning” as they can 
start out afresh by assigning a word or phrase for every meaning listed 
in the dictionary. 

To measure the excellence of language in respect to its words being 
unique in meaning, an index, E;, may be defined as a ratio calculated 
from the same representative sample of a million words of prose which 
has been used previously. This index is one million words divided by 
the number of those words when each is multiplied by its “frequency 
rank of meaning.” This frequency rank of meaning is similar to previous 
frequency ranks. It would require a semantic word count, i.e. a count 
of the frequency of occurrence of each meaning (as listed in the dic- 
tionary for each word) in the million word sample of prose. (See Eaton’s 
Semantic Frequency List for English, French, German and Spanish.) 
Each meaning of each word will be given a rank of 1 if it occurred 
most frequently, of 2 if it occurred next most frequently and so on. 
Each word would be multiplied by this frequency rank and all these 
products would be added up to get the denominator of E;.° Since no 
complete semantic word counts have been made as yet in the world to 
the author’s knowledge, (although a scientific committee is at work on 
this in the United States) a similar index of uniqueness of meaning of 





5 E.=(N/ZRs) where N =number of letters in sample studied; Rs =frequency rank of pronuncia- 
tions of each letter. 
6 E, =(N/=Rs) where N =number of words in sample; Rs =frequency rank of meaning of each word. 
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words may be E;4- E54 might be the number of words in the most com- 
plete dictionary of a language divided by the number of meanings 
listed in that dictionary. This variant index is easily computed but has 
the disadvantages of giving great weight to unusual and archaic mean- 
ings with which a language may be burdened and ignores the important 
factor of the frequency of the use of a word with multiple meanings. 


The world language should have unifoym pronunciation everywhere. 
This sixth criterion of excellence means ‘nat there should be no differ- 
ence in different countries in the way any word of the world language is 
pronounced. By means of the international standardized phonetic 
alphabet the standard pronunciation of every word can be fixed. 
Phonograph records and radio recordings can also fix the pronuncia- 
tion. Some people may comment that since languages have changed 
their pronunciation in the past would not the new international .an- 
guage also change as a whole or in regional dialects? This is highly im- 
probable as the modern forces such as radio and other agencies of mass 
communication would increasingly tend to unify and standardize and 
preserve pronunciation. Dialects grow only where people are separated 
with little communication between them. 

To measure the degree to which any candidate language approaches 
this ideal of universally uniform pronunciation an index of uniformity 
of pronunciation, Es, may be developed. For this index, a survey of 
perhaps a million words of oral speech would be needed. In this survey, 
a sample of persons representative of the various regions, social classes, 
etc. within each national language might be asked to read standardized 
prose into a recording machine. From these recordings, the frequency 
of each pronunciation of each word could be counted, and each pro- 
nunciation of a word given a rank. Then the index Eg,’ would be that 
million words divided by the sum of those words when each has been 
multiplied by its frequency rank of pronunciation. This index, like 
the previous ones, becomes 100%, showing complete uniformity of pro- 
nunciation, when the rank of every word is one so that the index is one 
million divided by one million. In proportion as there is more than 
one pronunciation for each word, the denominator increases and the 
index of uniform pronunciation shrinks. For example, if there were two 
pronunciations only on the average for every word the ranks of 1 and 2 
would occur equally often as weights in the denominator which would 
then have the average value of 1.5, giving an index of one million 
divided by a million and a half which is 67% of uniform pronunciation. 





7 #.=(N/ZRs) where N =number of words; Re=frequency of pronunciation of each word. 
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Again, if there were, in general, three pronunciations of every word so 
that the ranks of 1, 2, and 3 occurred about equally often, then the 
denominator would be twice the size of the numerator and the degree 
of the uniformity of pronunciation would be only 50%. 


A world language should have every letter unique in shape and sound. 
These two final criteria of excellence of any language apply only to 
its written and printed forms. The second means that every letter 
should have only one pronunciation and every elemental sound in the 
language should have a letter to represent it. The index to measure 
this is included in the index of phonetic spelling, E, above. 

For each letter to be unique in shape it means that every letter would 
have one and only one visual form, regardless of whether it occurs in 
print or in hand writing, or whether at the beginning of the word 
(where capitals are used in some languages), or in the middle or end 
of a word. Thus English has four forms for many of its letters and 
Arabic has three forms for many of its letters. To measure the degree 
of uniqueness of shape of letters more exactly a seventh index of excel- 
lence, E;, may be defined as the ratio from a sample of a hundred 
thousand letters in the representative samples of written and printed 
prose. This numerator would be divided by the sum of those letters 
each multiplied by its frequency rank of shape. This frequency rank of 
shape, like the preceding frequency ranks, would be determined from a 
count of the frequency of each shape of each letter. Putting them into 
rank order and multiplying each letter by its rank and adding these 
products gets the denominator of the index E;z.° It will be 100% only 
when every letter (including its connection to an adjacent letter) has 
only one shape. 

Seven indices of excellence for any language have been defined above. 
The next scientific step is to combine them into a single index of excel- 
lence for any one candidate language. There are various possible ways 
of combining them. The simplest way is to draw a profile graph. This 
means to draw a column showing the percentage value of an index and 
placing the seven columns for the seven indices of one candidate lan- 
guage side by side. The broken line across the tops of these seven col- 
umns is the “profile” for that language. By drawing and superposing 
profiles for the different candidate languages it might be obvious that 
one or two are far superior in most respects to all the others (or possibly 
far inferior to the others and so may be dropped from further consider- 
ation). 





8 E;=(N/R:) where N =number of letters in sample studied and R: =frequency rank of shape of 
each letter. 
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If the profile, however, shows several candidates with overlapping 
profiles a more exact method of combining the seven indices into one 
must be used. The simplest way is to get the simple unweighted aver- 
age, Ea, by adding them together and dividing their sum by 7. This 
gives a simple average index of excellence for one language permitting 
its excellence to be compared with the excellence of other candidate 
languages. If more refined weighting is desired, it can be secured by 
having panels of judges who are experts in the science of language dis- 
tribute a 100 points to the 7 criteria so as to show the relative impor- 
tance of each. The average number of points assigned by the judges to 
each criterion would then be a weighting factor for that criterion. This 
weighting factor for each criterion would be multiplied by the value of 
its index before adding the 7 indices and dividing to get the weighted 
average index of net excellence for one language, Ew.® 

Still more refined weighting schemes could be developed such as one 
based upon the number of man-hours, or the amount of human energy, 
required to learn and use whatever each index measures. Thus if un- 
phonetic spelling adds 20% of letters to the words of the language in 
general then the writing, typing, and type setting of that language re- 
quires 20% more time than a language having phonetic spelling. Simi- 
larly, the number of hours required on the average to learn the irregu- 
lar flections of a language compared with the number of hours required 
to learn an otherwise equivalent but flectionless language would yield 
a weighting factor for the third criterion dealing with flections. 


THE PURPOSE OF THE CRITERIA 


As a result of the researches outlined above there would be an index 
of familiarity and an index of excellence (such as F; and Ew) for each 
of the languages, whether an artificial or a living one, which are candi- 
dates to become the auxiliary world language. These indices will 
serve three purposes. First they would rank the candidate languages 
and tell which was the most familiar and the most excellent. Thus the 
problem of choosing the “best,” world language would find a scientific 
answer (based on rules in which the subjective element has been 
minimized). 

Secondly, these indices would diagnose and measure weakness and 
the degree of strength in each candidate language whether in its free- 
dom from idioms, its regularity in word order, its freedom from flections 
its phonetic spelling, its uniqueness of meaning of words, its uniformity 





®° Ew =(ZwE/w) where E =each of the preceding indices of excellence Z: to E;in turn; W =weight 
of each index. 
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of pronunciation or its uniqueness of the sound and the shape of its 
letters. From this diagnosis any living language could be simplified 
towards the ideal regularity of an artificial language while preserving 
more of familiarity to some part of the world’s population than artifi- 
cial languages possess.’° 

A third purpose of these indices is to measure any progress that the 
world may be making from decade to decade towards achieving a single 
world language. Tie relative degree of gain among the rival languages 
may be measured from period to period partly by technics of repre- 
sentative sampling as in Gallup polls. The degree of a person’s knowl- 
edge of a language, averaged for a population, must be also included 
in any accurate measurement. Any trend, however, siizht, towards a 
single language eventually sweeping the field and becoming the sole 
auxiliary language would be shown, and its spread could be facilitated. 





10 This has been done for English. The resulting “Model English,” constructed by the author, has 
the regularity of the most ideal artificial language coupled with greater familiarity to more people than 
any rival national or synthetic language. Its indices of excellence are ‘all 100% and its indices of famil- 
iarity are well above all rivals. This gives Model English first rank by all criteria for a world language. 

















CONFIDENCE LIMITS IN THE NON-PARAMETRIC 
CASE 


GoTTrriep E. NorETHER 
Columbia University 


The purpose of this article is to give a survey! of certain 
methods available for finding confidence limits when nothing 
is assumed about the population from which a sample has been 
drawn except possibly that it has a continuous distribution. 
The following three cases are treated: a confidence band for 
the unknown cumulative distribution function, a confidence 
interval for the proportion of a population for which the vari- 
ate is smaller (or larger) than a given value, and confidence 
intervals for quantiles. The results have been known for many 
years, but have often been accessible only to those who were 
able to follow rather involved mathematical arguments. It is 
the purpose of this paper to state certain important results 
without referring to any mathematical proofs. 


ntroduction. In the application of statistical theory to practical 
I problems the normal distribution occupies a predominant position. 
If we can assume that the observations which we have taken have come 
from a normal population a great many of our troubles are over. How- 
ever it will often happen that we do not know much more about the 
parent population than is supplied by the sample itself. What can we do 
then? The arbitrary assumption of normality may obviously lead to 
completely wrong conclusions. Having only a scant knowledge about 
the parent population we are forced to make very broad assumptions. 
Thus in many cases it may be reasonable to assume that the unknown 
cumulative distribution function (cdf) is continuous. This will be our 
assumption in all that follows unless stated otherwise. In contrast 
to the parametric case when the form of the cdf is supposed to be known 
except for the values of a finite number of parameters this is often 
referred to as the non-parametric case, since a finite number of parame- 
ters are not sufficient to determine the distribution completely. 

One of the important problems in the parametric case is the es- 
timation of the unknown parameter—for simplicity we assume that 
there is just one—by a confidence interval. The question arises if the 
idea of a confidence interval can be extended to the non-parametric 
case. Such an extension was made by Wald and Wolfowitz [1]. 





1T should like to thank Professor Wolfowitz for pointing out the need for such a survey. 
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Confidence Bands for Unknown Cumulative Distribution Functions. 
Before going into the non-parametric case it may be well to review 
briefly the basic idea underlying confidence intervals in the paramet- 
ric case. Let the random variable X have the cdf F(x, 4), i.e., 
P{X<zx| 0} =F(z, 0), where P{A|B} as usual denotes the probability 
of A computed under the assumption that B is true. The functional 
form of F(x, 6) is supposed to be known. The only unknown quantity 
is the true value 4) of the parameter 6. 

The next best thing to knowing the exact value 4) would be to have 
an upper and lower bound for 6. Thus we are led to try to determine 


two numbers U(m, +++, %n) and L(m, +++, 2.) depending on n ob- 
servations 2, - + +, £, on the random variable X such that L(a,---, 
Ln) S09 SU (M1, + - + , Xn). However this is only possible if we are willing 


to take a definite risk of obtaining incorrect limits. Thus we can fix 
a confidence coefficient a, 0<a<1, and then determine two expressions 
L and U—for simplicity we omit from now on to indicate that L and 
U depend on the observed sample values—in such a manner that 
L<6,SU with probability a. In other words if we perform a great 
many experiments, compute each time Z and U corresponding to the 
same confidence coefficient a, and state every time that the true 
parameter value 4 lies between LZ and U, we shall in the long run 
make correct statements 100a% of the time. 

In the case when the form of the distribution function is known 
except for several unknown parameters confidence regions can be 
defined in a similar manner. However, as we have seen, if we assume 
only that the unknown cdf is continuous a finite number of parameters 
is no longer sufficient to specify the distribution completely. Now in 
order to know F(x), we have to know its value for every t, — © <x 
<-+o. Thus instead of looking for two numbers L and U as in the 
parametric case we should now look for two functions L(x) and U(z) 
defined for all x and then state that 


(1) L(x) S F(x) S U(z), —-x<24<+, 


As in the parametric case we ought to indicate that both L(x) and 
U(x) depend on the observations %, + ++, Ya, but for simplicity shall 
again omit to do so. As before, it is impossible to determine L(x) 
and U(x) in such a way that (1) is always true, but again we can fix 
a confidence coefficient a, 0<a<1, such that in the long run (1) is 
rue 100a% of the time. We shall say that L(x) and U(x) determine a 
confidence band for the unknown edf F(x) corresponding to the con- 
fidence coefficient a, meaning that the band determined by the graphs 
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of L(x) and U(x) covers the graph of F(x) completely with probabil- 
ity a. 

As in the parametric case there are infinitely many ways of deter- 
mining L(x) and U(x). Very little work has been done so far in finding 
confidence limits in the non-parametric case which could be termed 
best. However from the standpoint of facility of application one class 
of functions L(x) and U(x) seems to have definite advantages. Before 
we can describe this class of functions we have to define first what we 
mean by the sample cdf. 

Let again 7, %,°*+, 2, be a sample of size n from a population 
having cdf F(x). For simplicity we shall assume that 7; S225 +++ Sap. 
Since we are not interested in the order in which the sample was 
drawn this is no restriction. The sample or empirical cdf F,,(x), as it is 
sometimes called, is now easily constructed. It is the step function 
which is equal to 0 for «<2; and equal to 1 for x=2,, while increasing 
by 1/n at each of the values x;, 7=1, 2, +--+, n. Thus we can write 
F(x) =1/n (the number of observations which are <2). It can be 
shown that as n— © F,,(x) converges stochastically to F(x), or, in other 
words, that as m increases we can be almost certain that F,(x) ap- 
proaches F(x) more and more. 


Determination of L(x) and U(x). The convergence of F,(x) to F(z) 
which we have just stated suggests the following method of defining 
the lower and upper boundary of a confidence band for F(z): 


F(z) -—d if F,(x) -d>0 
0 otherwise 
F(z) +d if F(x) +d<1 


1 otherwise 


L(x) = { 
(2) 
U(x) = { 


where d>0 is a constant determined in such a way that (1) is satisfied 
with probability a. Obviously, d is a function of a and n. 

No formula is available to determine the value of d for any given 
a and n. However, Wald and Wolfowitz [1] have shown how to com- 
pute a when d and n are given. Though it would appear from (1) that 
a should also depend on F(z), this is, fortunately, not the case. Thus a 
double entry table for a corresponding to values of d and n couid be 
computed. From such a table the value of d which for a fixed n cor- 
responds to a given value of a could be found by interpolation. 

The following scheme shows the computation that has to be per- 
formed to find a. Compute 2m numbers, a; and b;, 7=1, 2,-+-+, n, 





92 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1949 














where 
t—nd ; a — nd 
— 1 > 0 
a; = 3 n n 
0 otherwise 
[¢-1l+nd . ~—1l+nd 
if <3 
bi = 3 n n 
l 1 otherwise. 
Define n functions Py4:(x), k=0, 1,---, n—1, with the help of 


the recursion formula 


Fidei, Pudd f " Py(dt 


ak +1 
where 
x if «2 < bey 
ee . otherwise. 
Then 


a = n!P,(1). 


It is easily seen how the fact that we have to consider various upper 
limits at each application of the recursion formula makes the computa- 
tion of a very cumbersome. However there is a very good approxima- 
tion to a involving considerably less computational work. Set a=2a—1, 
where & is determined in the following way. Take the same a;, 7=1, 2, 

-, n, as before. Define n functions Pi4;(z), k=0, 1,---, n—1, 
with the help of the recursion formula 


Pi(z)m1, Pryle) = f ” Px(de. 


Then 
(3) & = n!P,(1). 


Now only one integration is necessary at each application of the re- 
cursion formula. This procedure can be reduced to the evaluation of 
a determinant of order n+1 [1]. Above approximation of a with the 
help of & is such that it increases our protection in the sense that the 
true probability of our confidence band covering F(x) completely may 
be larger, but is never smaller than 2a—1. 
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Discontinuous cdf’s. As mentioned in the beginning we have assumed 
throughout that the unknown cdf is continuous. Under this assumption 
the probability that two sample values are equal is zero. In practice, 
however, due to limitations of measurement, it is quite possible that 
two or more sample values turn out to be equal. In such a case it may 
happen that for some sample value x=2;, say, the upper limit U(z) 
for x <<; is lower than the lower limit L(x) for z22z;, making it impos- 
sible to draw a continuous cdf between L(x) and U(x). In such a case 
we shall be justified in asserting that the true distribution has a dis- 
continuity at r=2;. 

The question arises what is the probability that our confidence band 
constructed as described above covers completely the true cdf if it 
should be discontinuous. We can no longer state that this prob- 
ability is exactly a, but it has been shown that in this case we have 


P{ L(x) $ F(a) S$ U(z)} 2 


so that actually our protection is better than claimed. 


Asymptotic Results. It is evident that it would be very desirable to have 
tables of the kind described earlier. In the meantime certain asymp- 
totic results are available. If we let 


(4) = dv/n. 


Smirnoff [2] generalizing a result by Kolmogoroff has shown that 


(5) lma=1-— >> (—1)#te-2m°r?, 


r—o j=0 


Since (5) contains a very fast converging series, it is not difficult to 
compute \ corresponding to a given confidence coefficient a to any 
desired degree of accuracy. The corresponding value of d is easily 
determined from (4) to be 


; a= 
(6) — 


A short table of \-values is given in Kolmogoroff [3]. These values are 
based on a table by Smirnoff [4]. 

Formula (4) may also be used in a different way. For a given inves- 
tigation we may want to fix not only the confidence coefficient a but 
also the width of the confidence band we are going to obtain before- 
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hand. Then with the help of (4) we can determine the size n of a 
sample which will satisfy these requirements, 


(7) n=—. 


Thus if we should decide that we want a 95% confidence band such that 
the upper and lower bounds for F(x) should not be more than .2 apart, 
we have to take \=1.35 and d=.1. Substituting in (7) we find that we 
shall have to use a sample containing at least 183 observations. 

It is interesting to note that the width of the confidence band is 
inversely proportional to the square root of the sample size. This is 
equivalent to saying that the accuracy of our statements is directly 
proportional to the square root of the sample size. 


One-Sided Limits. As in the parametric case when it may happen 
that we are only interested in an upper (or lower) limit for the unknown 
parameter, it may happen in the non-parametric case that we only 
need an upper (or lower) limit for the unknown cdf. Exactly the same 
method applies as in the two-sided case, except that now we have to sub- 
stitute & as computed by (3) for a, i.e., we can state with confidence 
coefficient & that F(x) = U(x) (or L(x) SF (x)), -w~<xr<+on. 

An asymptotic approximation is also available. Smirnoff [2] has shown 
that 


(8) lim @ = 1 — e~-®’ 


no 


where again \=d+/n as in (4). (8) is very easily solved for \. Indeed 
we get A=++/ —4lg(1—a) where the logarithm is the natural or Na- 
perian logarithm. 


Goodness of Fit. It may be worth while pointing out that L(x) and U(z) 
can also be used to test the hypothesis that the unknown cdf F(z) 
= F,(x), where Fo(x) is some given cdf. If we reject this hypothesis 
whenever F(z) intersects either L(x) or U(x) or both, we shall be using 
a test with a critical region of size 1—a. 


Confidence Bands Giving Confidence Intervals for F(x) at a Specified 
Value x. The method which we have just discussed assures us a con- 
fidence band that with probability a covers the true cdf F(z) in its 
entirety. It is not difficult to see, however, that for any given x=2p, 
say, the probability that the corresponding interval [L(xo), U(zo)] 
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will contain the true value F(2xo) is often considerably greater than a. 
It follows that if we are only interested in finding a band which for a 
given, though arbitrary value zo contains the true value F(29) with 
probability a, we can use a narrower band, thus increasing the ac- 
curacy of our statements. Such a band is easily found. In fact, let 
F (xo) =p. Then p can be considered as the unknown parameter of a 
binomial variate X defined by P{ X <2o} =pand P{X>x} =1—p=g. 
We have reduced the problem to a parametric one the solution of 
which is well known. 

Using F (x9), where F(x) is the sample cdf as before, as the sample 
estimate of p we can find two quantities L’(%) and U’(z) in such 
a way that P{L' (20) <F (xo) SU’ (x0) } =a. Now 2 has been com- 
pletely arbitrary. Letting it take all values from — © to + we get 
two functions L’(x) and U’(x) determining a confidence band which 
satisfies our requirements. To distinguish between the two confidence 
bands we have obtained we shall refer to the one given by L(x) and 
U(x) as the type 1 and the one given by L’(x) and U’(x) as the type 
2 band. 

If the sample size n is sufficiently large so that we can use the normal 
approximation to the binomial distribution L’(x) and U’(x) are given 


by 


n e _ F,(x) {1 — F,(x)] t? 
— ° 2n ty/ +7, 


























n+? n An? | 

~ «2 t F.@iu-r@). | 
P. ve “f= — F,(x Ld 

n+e@L @) | ° n ae 


respectively, as is shown, e.g., in Cramér [5], p. 514, Ex. 2, where ¢ is 
the 100(1—a)% value of a normal deviate. 

Let w(x) = U’(x) —L’(x) be the width of the type 2 confidence band. 
From (9) we find 


_ _2nt F.(a)[1—Fa(z)] 
(10) w(x) = — = + 


Thus the width is no longer a constant as it was for the confidence 
band of type 1, but is now a function of z having its maximum value 
for those x for which F(x) =1/2 or is closest to 1/2. 

It is instructive to compare this maximum width with 2d, the width 
of the type 1 band, in an example. Let a=.95 and n=216, the size of 
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the sample we shall consider later. We find 2d=2)/./n=2.7/\/216 
=.184. For F,(x)=1/2 (10) reduces to t/\/n+#. From a table of 
normal deviations we find t=1.96. Substituting we get w=.132, which 
is quite an improvement over .184. 

So far we have assumed that we were able to make use of the nor- 
mal approximation to the binomial distribution. If this approximation 
is not accurate enough, we have to compute binomial confidence inter- 
vals as shown by Clopper and Pearson [6]. Now for 232% <2p41, 
k=0,1,--+,, H=— ©, Iny1=+ we have 


(11) L'(z) =m, U(x) = % 


where m is the lower, 6 the upper binomial confidence limit cor- 
responding to the observed frequency ratio F,(z.). The original 
graphs of Clopper and Pearson together with others have recently 
been reproduced in Eisenhart [7], pp. 332-335. These graphs s:.0w 7 
and @ as functions of the observed sample ratio for n=10, 15, 20, 30, 
50, 100, 250, 1000 corresponding to a=.95, .99 and for n=5, 10, 15, 
25, 50, 100, 250, 1000 corresponding to a=.80, .90. For other values 
of n 7 and @ have to be found by interpolation. 

The exact values of and @ corresponding to any n are given by 


(12) Iu(k, n — kb +1) =1—a/2 

(13) hats ~ 84D) «1-e 

where I,(p, g)=f (1-1) dt/ i ‘t?-1(1—1)*1dt is the incomplete 
0 


beta function. By definition 7.=0, 0,=1. It is sufficient to solve either 
for the n’s or the 6’s since by (12) and (13) 


ne = 1 — On-x. 


To find, e.g., 7. we can make use of the tables of percentage points 
of the incomplete beta function by Thompson [8], entering these ta- 
bles with »,=2(n—k-+1) and »,=2k on the page giving the 100(1—a/2) 
percentage points. 

When introducing the confidence band of type 2 we stated that we 
wanted a confidence band such that the probability that for any ar- 
bitrary 2 L’(xo) SF (xo) S$ U' (xo) was a. This statement may have been 
somewhat misleading. We cannot make this probability exactly equal 
to a. For large n we used the normal approximation, thus committing 
a slight error, while for small m due to the discontinuous character of 
the binomial distribution exact confidence intervals do not exist, and 
we have to be satisfied with the statement that the confidence coef- 
ficient is 2a. 


- a 
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It may be well to illustrate the use of the two types of confidence 
bands by some examples. An economist may want to analyze the in- 
come structure in a given community. If he is interested in the 
distribution of income over a specific range he will need a type 1 band, 
since he is looking for the joint occurrence of certain events. If, on the 
other hand, he only wants to state that at least 1% and at most u% 
of the population earn no more than $.... a year a type 2 band will give 
him the appropriate answer. 

It may happen that we are not interested in an upper but only in a 
lower limit. A social worker who wants to prove the need for a new 
hospital in a certain section will want to state that at least such and 
such a percentage of the residents earn less than $.... a year and thus 
cannot afford to go to a private hospital if the need should arise. 
The answer will be given by a type 2 band constructed with the help 
of one-sided confidence intervals. Then U’(x)=1, while L’(x) =m, 
where again 7 is the solution of (12), except that this time the right 
hand side should read 1—a. 


Confidence intervals for Quantiles. A confidence band of type 2 can also 
be used to construct confidence intervals for quantiles, i.e., confidence 
intervals for the value q, for which F(q,)=p, 0<p<1. Such a confi- 
dence interval consists of all those values x for which L’(x) <<p<U’'(z). 
These values of x are bounded by two observations, x; and 2;, say, 2: 
being the smallest value for which U’(x)>p, x; the smallest for which 
L'(x)=p. If @ is the confidence coefficient connected with our coni- 
dence band, we can then say that 


ti SQ S22; 


is a confidence interval with confidence coefficient a for the unknown 
quantile g,, i.e., in the long run confidence intervals chosen in this 
way will include the true value g, 100a% of the time. 

If a type 2 confidence band has been constructed x; and 2; can easily 
be read off. However, we can find the two values also algebraically. 
To be exact, using the formulas for L’(z) and U’(x) we can find two 
integers 7 and j such that the corresponding observations x; and 2; 
are the two observations we are looking for. 

For large n the type 2 confidence band is given by (9). Let as usual 
[y] stand for the largest integer <y and set 


yi = np — t/np(l — p), 


(14) en 
y2 = np + tr/np(1 — p), 
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where as in (9) ¢ stands for the 100(1—a)% value of a normal deviate. 
Then 7 and j are found to be given by 


(15) t= [yw] +1, = [y] +1, 


unless y2 is an integer itself, in which case 7 = y2. 

Since we are only using an approximate confidence band, the solu- 
tion given by (15) may sometimes lead to a confidence interval the true 
confidence coefficient of which is slightly <a. Therefore, a safer, though 
very often unnecessarily wide confidence interval is given by 7=[y:], 
j=|[yz]+1. It should also be remembered that if p (or 1—>p) is small 
the normal distribution is not well suited to serve as an approximation 
to the binomial distribution, even if n is relatively large. 

If the type 2 confidence band is given by (11) i and 7 are found to 
be defined by the relations 


6:41 Sp, 6; > DP, 
na <P, nj = P, 


where the n’s and @’s are given by (12) and (13). 

A case of special interest arises when p=1/2. Then q is the median 
M of the unknown distribution. For this case Nair [9] has tabulated 
the values of 7 and 7 for n $81 corresponding to a=.95 and n $76 cor- 
responding to a=.99. 

We shall close with one specific application. In biological work it is 
often important to find a confidence interval for the median lethal dose 
of a given drug. This is the dose which would kill 50% of the animals 
of a given population. In the second experiment described in Bliss [10] 
the individual lethal doses of digitalis (cc. of tincture per kg. of life 
weight) of 216 test animals, in this case cats, were obtained. These 
data,’ after adjustment for laboratory differences, can, with some cau- 
tion,? be considered as a random sample of individual lethai doses. 

The usual procedure for finding a confidence interval for the median 
dose is to take logarithms of the observations and assume that the log- 
doses are then normally distributed. If we are willing to make this as- 
sumption a confidence interval for the median log-dose can be found 
by a well known procedure which, however, involves considerable com- 
putation. The fact that the log-doses are approximately normally dis- 
tributed has been observed in many experiments of this kind, but Ma- 
ther [12], esp. p. 240, warns that for any new drug or new method of 





2 I want to thank Professor Bliss for putting these deta at my disposal. 
3 See in this connection [11]. 
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preparation the logarithmic transformation has to be justified anew. 
Then what shall we do in such a case if we have a sample of observa- 
tions which is obviously not large enough to carry out a conclusive 
test of normality? An unjustified assumption of normality may lead 
to a serious mistake in our conclusions. However, we may use the more 
general procedure outlined in this section. The assumed continuity of 
the unknown distribution function is hardly a restriction. Besides, 
Scheffé and Tukey [13] have shown that the same procedure is ap- 
plicable for all possible cdf’s, if we only change the confidence statement 
slightly. 

Let us now return to the data mentioned above and find a confidence 
interval for the median dose without assuming that by transforming to 
logarithms we can make use of normal theory. Substituting p=1/2 and 
n=216 in (14) we find at the 95% level 


yi = 108 — .98+/216 = 93.6, 
108 + .98,4/216 = 122.4, 


Y2 
or by (15) 
a = 94, j = 123. 


All that remains to be done is to order the observations in increasing 
order of size and select the 94th and 123rd observations. Thus we find 
the confidence interval 


.640 = M Ss .671. 


It may be worth while emphasizing at this point the extreme ease 
with which all the non-parametric procedures we have encountered 
can be carried out. 

A more important problem of biological assay than determining a 
confidence interval for the median dose is to find a confidence interval 
for the relative potency of a preparation of unknown potency com- 
pared with that of a standard preparation. As a matter of fact the 
data we used were originally of this kind. Again this problem is usually 
solved on the basis of normal theory, however, non-parametric methods 
are also available in this case. I hope to be able to come back to this 
problem at some later date. 

One more remark before concluding—it is often stated that non- 
parametric methods lead to much wider confidence intervals than 
parametric methods, in particular the use of normal theory. This is 
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of course true, and parametric methods should be used whenever 
there is a sound basis for their application, unless the greater ease of 
non-parametric methods should outweigh the advantage that can be 
gained by the use of parametric methods. However, to assume that a 
sample has been drawn from a normal population for the sole reason of 
obtaining narrower confidence intervals is to defeat the purpose of 
statistics. 
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MATHEMATICAL THEORY is presented which when applied to a com- 

parison of the registrar’s list of births and deaths with a list 
obtained in a house-to-house canvass, gives an estimate of the total 
number of events over an area in a specified period; also the extent 
of registration. 

In the development of the theory, allowance is made for the fact that 
the chance of an event being missed on one list (registrar’s list or the 
house-to-house canvass) may not be independent of its chance of being 
missed on the other list. Where there is likely to be lack of independ- 
ence, a test is suggested and a method introduced to reduce the effect 
of dependence. This is done by subdividing the data into small ho- 
mogeneous groups, such as might be formed by small areas, sex and age 
classes, domiciliary and institutional births; then by estimating the 
number of events in these groups separately and summing them for a 
total. The standard errors of the estimates are given. 

The theory is applied to an enquiry that was conducted in February 
1947 over an area known as the Singur Health Centre, near Calcutta, 
covering the years 1945 and 1946 separately, and it is found that the 
estimated total number of events for the area is usually greater when the 
estimate is built up by summing the totals for individual groups than 
when it is computed at once for the aggregated population. According 
to the theory this observation confirms positive dependence and in- 
dicates that the greater figure is nearer the truth. 

The annual number of births and deaths in the Singur Health Centre 
(total population 64,000) is estimated subject to a standard error of 
from 1 to3 per cent, and the registration is estimated to vary from about 
40 to 70 per cent with a standard error of about 3 per cent. This 
enquiry provides basic ground work for the design of future surveys, 
and it is estimated that at a cost of Rs. 10,000 to Rs. 15,000 (3 rupees 
to the U.S. dollar) estimates of birth and death rates for an entire Dis- 
trict in India with a population of one to two millions can be obtained 
with an overall standard error of about 5 per cent. 
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Purpose. The purpose here is to present a theory by which when vital 
registration is incomplete, an enquiry in the form of a house-to-house 
canvass may be used in conjunction with the registrar’s list to estimate, 
z. the total number of births and deaths in an area over a specified 
period; 27. the birth and death rates; 777. the deficiencies in registration; 
and iv. the standard errors of all these estimates. The theory will 
first be presented, then applied to particular surveys in the Singur 
Health Centre. 


Method of enquiry. The application of the theory which is to be devel- 
oped requires a comparison of the entries on: 


1. The registrar’s list (referred to as R) 

2. The result of a complete house-to-house canvass carried 
out by an interviewer (referred to as J) and the classification of 
the entries on these lists into the following four exhaustive groups: 


C, the number of entries recorded in J and also in R (such 
entries, being found on both lists, are assumed to be correct 
without investigation). 


Ni, entries recorded only in RF but not in J, and after in- 
vestigation found to be correct. 


N2, entries recorded only in J but not in R, and after in- 
vestigation found to be correct. 


X, entries recorded on one list or the other, but not both, and 
found after investigation to be incorrect. 


This is a complete classification of the entries on the lists but not of 
the events. There will also be a number Y of events which are missed 
by both lists; this number will be estimated later by application of the 
theory. 


Theory. Let N be the total number of events (births or deaths) in the 
specified period. Then an estimate W of N is furnished by the formula 
N=C+Ni+N2+N,N2/C wherein N,N-2/C is an estimate of the num- 
ber of events Y missed by both R and J. This formula of estimation 
assumes that the chance of an event being missed on either list is in- 
dependent of the chance of being missed on the other. A method is 
presented later on for investigating the validity of the assumption of 
independence and for introducing a modification where necessary. 

It can be shown that: 7. N is an unbiased estimate in the limit when 
N becomes large and the assumption just mentioned is valid; iz. the 
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maximum likelihood estimate is equal to N in the limit; ii7. the standard 
error of N is ~Nqiqe/pip2. The last formula will be developed in the 
appendix. Here, 


pi = the chance of RF detecting an event 


~2 = the chance of J detecting an event 


aMt+tn=mt+o =i. 


It follows that the better the performance in either R or I, the higher 
be p1 or pe, the smaller be qi or g2 and the more precise be the estimate 
N of the total of events. It follows, moreover, that the precision of NV, 
expressed as a proportion (namely as a coefficient of variation), is 
/9:92/N pipe, Wherefore if the theory be applied over an area large 
enough to contain a large number N of events, the total number N of 
events will be estimated with great relative precision. 

The symbol p; is a measure of performance of the registrar, an es- 
timate for which is $:=C/(C+Ne2). This estimate %, of p: is subject 
to a coefficient of variation of 


y/ qi N-C-N, 

(C + Ne)pr N-1 

This error decreases as C-+N¢2 increases. For perfect performance on the 
part of the interviewer, C+N2=N, and there is then no error in es- 
timating the performance of the registrar. 

The foregoing development is oversimplified. In practice there are 
some problems to take account of—incomplete investigation of the R 
lists; incomplete coverage of the populstion in the house-to-house 
canvass. Special types of events, like those occurring in institutions, 
are best taken care of as a separate group. Then again there is the 
problem of investigating the assumption mentioned above, and of 
measuring and correcting for the correlation between the chance of an 


event being missed by R and being missed by J. These points will be 
examined in the following paragraphs. 








Effect of incomplete investigation of the registrar’s lists. In the investiga- 
tion of the R-lists there may be some entries left over unclassified by 
reason of incompleteness of entry, illegibility, or simple failure for any 
reason whatever on the part of the investigator to finish his job. So long 
as the correct entries amongst the unclassified entries on the F-list 
constitute unbiased samples from the two categories C and N; men- 
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tioned earlier, the omission of the unclassified entries from the calcula- 
tions does not affect the estimation of N, the total number of events. 
The estimate of the extent of registration will be too low if the un- 
classified entries contain, as is likely, correct entries classifiable as C. 
If the unclassified entries are all counted as correct, contrary to fact, 
the calculations will lead to an overestimate of the extent of registra- 
tion. 


Effect of incompleteness of coverage of the population. As in every 
population enquiry, there will be some failures to elicit information 
from all the households. This will happen when some households in 
which an event took place have moved away temporarily or per- 
manently, or when no responsible person can be found at home to give 
the information. So long as the events in the uninterviewed portion of 
households are included in the R-list to the same extent as those in 
the interviewed households, the estimation of N is unaffected. The 
calculation of N may therefore be little affected by incompleteness of 
coverage of the population. . 


The effect of institutional events. In rural areas the bulk of the births 
are domiciliary, but there are some small scattered hospitals drawing 
patients from a wide area, and a high proportion of the events that take 
place in them are for non-residents. The R-list may contain some or 
even all of the entries for these institutional events because the 
registrar is able to ascertain this information easily and accurately from 
the institutions. The interviewer, on the other hand, will, by the na- 
ture of a house-to-house canvass, fail to discover an institutional event 
concerning people who had no family connections in the area. In- 
stitutional events, as they are accurately ascertainable, are best 
handled as a separate block and not as a problem of estimation. 


The effect of correlation between events missed on both lists. The first 
step is to define this correlation. The registrar and his co-workers 
will detect some events and miss others. The probability that the in- 
terviewer [J] will detect an event that was missed by R may be differ- 
ent from the probability that he will detect an event that was recorded 
by R. If these two probabilities are equal there is complete independence, 
but otherwise there is not, in which case the formula given above for 
the estimation of the total number of events will be incorrect. The 
extent of the error can be investigated. If as before, 

pi=the probability of the registrar detecting an event 

g.1=the probability of the registrar missing it 
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then the probabilities in the 4 groups will be shown by the accompany- 
ing table, which defines four new probabilities, P2:, P22, Qo, Q22. p and 
q are always complementary: pa+ qa = Poot gee = 1. 














Group Probability 
C Detected by both Pipa 
N: Detected by registrar only P1g2 
Nz Detected by interviewer only Qipes 
Y Missed by both Q1Q22 





If there is complete independence between the events missed by both 
R and I, then pau=p2=pe2, introduced previously, and go =q22=4q:. 
When there is dependence the expected value of the estimate of the 
number of events Y missed by both R and J will be close to 
N 919219122 
Pipa 


whereas the correct value is Ngqigx. The difference is 


N 19219: P22 Pes 
eens = Naqigee = va( = — 1). 
Pipa Pa 


So if pa >pz, the total number of events is underestimated and if 
Pu <~pz22, the converse. We surmise that p21> pz is likely to be the case. 

Similarly, in the case of dependance, the registrar’s performance is 
estimated as pipa/(pipat+qipe2) instead of pi, the difference being 
(pei P22) P191/(PiPut+gqiper). If po>pe2 the registrar’s performance is 
overestimated and if poi <po2, the converse. 


If 
n= 8 ja = 2 
Pa = 6 qu = A 
Da = 4 ga = .6 
the bias in the estimation of the total number of events will be 
P22 
qn (= - 1) = — .067 or — 6.7 per cent. 
P21 


This bias may be much more important than the standard error of an 
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estimate of the total number of events made under the assumption of 
zero correlation. 


Method to reduce the effect of correlation. It is important to note that 
correlation signifies heterogeneity in the population for it implies that 
events that fail to be detected do not form a random sample from 
the whole population of events. This heterogeneity may arise only if 
there are differences in the reporting rates for different segments of the 
population, resulting in the group of failures being weighted dispropor- 
tionately by the different segments. 

It therefore follows that the correlation can be minimized by dividing 
the population into homogeneous groups and calculating the total 
number of events separately for each group; then by addition getting 
the grand total. In order to put this suggestion into practice, let us 
consider the difference between two estimaies of the total number of 
events: t. by dividing the population into homogeneous groups and 
estimating the events in each group separately, then forming a grand 
total; 72. by treating the entire population as a unit. Let the population 
be comprised of k homogeneous groups, with N; events in the 7-th 
group (t=1, 2,---, k). Then let p,“” be the probability of the regis- 
trar detecting an event in the i-th group, and p.‘® the corresponding 
probability for the interviewer. The expected value of the number of 
events missed by both in the 7-th group is Niqi gq. and for the entire 
population the total missed by both will be 2N.q,“q2. As by defini- 
tion there are only k homogeneous groups, this value will be estimated 
without bias when the groups are treated separately. But if the entire 
population of events were pooled, the expected value for the estimate 
of the number of events missed by both would be close to 


[> Nini ge | [D0 Nin pe 














>> Nipi po 
The difference in the two values will be 
[> Ving] [Do Nin pe | ~ DVa% a = — N?S\Sar 
D> Nip > Nip. 
where 
S. = DL Nila — pi]? 
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> Nip p # Nip2 


N=) Ni A= "> y, a 7 


and 


Sie Nila — bi) [p2 — pa) 
SiS SiS. = N; 








is the correlation coefficient between p,“ and p.“, weighted by Ni, 
the number of events to which they have reference. If r>0, then treat- 
ing the entire population as a unit, we are led to an underestimation 
of the number of events missed by both parties and therefore an un- 
derestimation of the total number of events. This also results in an 
overestimate of the extent of registration. If this is the case, the popu- 
lation need be divided only to the stage when further division shows no 
increase in the total number of events. It should be possible by actual 
trial with some real data to decide whether (e.g., in computing number 
of deaths) 5-year age groups are a more effective subdivision than 10- 
year age groups; and whether infant deaths should be treated sep- 
arately. 

The enquiry tn Singur Health Centre. The Singur Health Centre con- 
sists of four contiguous Union Boards, viz., Singur, Balarambati, 
Bora, and Begumpur, situated in the Serampore sub-division of the 
Hooghly district. The village Singur which serves as the headquarters 
is only 21 miles away from Calcutta and is easily accessible by rail 
from Calcutta. The total area of the Centre is about 33 square miles 
and comprised of 68 villages with a total population of about 64,000 
distributed over 12,000 families living in about 8,300 houses. As is usual 
in West Bengal, the villagers live close together in a compact block and 
wide fields separate such blocks. Since 1944 this area has formed the 
controlled practice field of the All India Institute of Hygiene and Public 
Health, Calcutta, for their experiment in Public Health Methodology. 

Procedure for registration. The procedure for the registration of births 
and deaths in this area follows closely the method adopted in other 
parts of Bengal. The Chowkidar, i.e., the village headman, is the re- 
porting agent and is required to submit periodically to the Sanitary 
Inspector,? who is the registrar of the area a list of births and deaths. 





1 The Bengal Province is divided into divisions, the divisions into districts, the districts into 
subdivisions, the subdivisions into thanas, and the thanas into Union Boards. 
‘A Sanitary Inspector is usually in charge of the health activities of a thana. 
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With a view to improving the registration in this area, the voluntary 
services of a villager have been enlisted. He is not only expected to 
assist the Chowkidar, who may be illiterate, by making entries in the 
Chowkidar’s register, but also to inform the registrar directly on all 
births and deaths in the village. The registrar also obtains a list of 
births, maternal and infant deaths as known to the Maternity and 
Child Welfare Department, and by co-ordinating the information from 
the three sources is expected to improve birth and death registration. 
For all practical purposes the voluntary agency began operating only 
from January 1946. 

Method of enquiry. The enquiry in the Singur Centre covering 1945 
and 1946 was started on the 17th February 1947. The field work lasted 
for eleven weeks. In this enquiry an interviewer called on every house- 
hold to enumerate the resident population (separately as present and 
absent) and visitors with particulars of community, age, sex, and 
marital status, and to list all births and deaths which occurred in the 
village during 1945 and 1946, listing separately with relevant particu- 
lars those that occurred outside the Singur Health Centre. The lists so 
prepared are the I-list which, as was mentioned earlier, were compared 
with the registration books (the R-list). In the field-organization as 
actually employed, there were four investigators who worked at the 
comparisons and supervised the work of the 16 interviewers. The inter- 
viewers and the investigators were selected from the village popu- 
lation as it was thought that they would be able to obtain better co- 
operation than an outsider. 

It should be emphasized that the comparison of the two lists is crucial. 
The establishment of the identity of two entries, one on one list and 
one on the other, sometimes requires extreme perseverance. In some 
cases the registrar’s entry is by hearsay, and part of it may be wrong, 
and often much consultation is required. The interviewer's entry, how- 
ever, is fortunately accompanied by a house-number or other means of 
identification by which the information may be verified if necessary. 

Basic data obtained from the enquiry. Table I shows the results of the 
investigators’ comparisons of the R and I-lists. As mentioned earlier, 
there are some problems arising from illegible and incomplete entries, 
the movements of the population and institutional births. The table 
gives some idea of the magnitude of these problems. For example the 
non-verifiable entries on the registrars’ lists run to roughly 10% or more 
of their total entries. In view of their magnitude the assumption that 
the unverifiable entries are a representative sample of all entries, an 
assumption that will be made in the calculations, becomes all the more 


-~ 
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important. The need of more careful registration in the future is appar- 
ent. 

No separate account was maintained of the number of correctly 
registered events occurring in families that had migrated out of the 
village prior to the interviewers’ survey. The assumption will be made 
that the registrars vould have recorded this category to the same de- 
gree as for the non-migrants, but the number is small and under the 
conditions of the Intian village, this assumption is not important. 

In this enquiry the non-resident institutional births and deaths are 
considered separately and excluded from the table, as indicated. Insti- 
tutional facilities exist only in the Singur Union Board. The number 
of the institutional births to non-residents was about 8% in 1945 and 
1946. The number of institutional deaths of non-residents was only 
about 3%. 

Estimation of total births and deaths. In order to investigate the 
homogeneity of smaller groups comprising the whole, so as to arrive at 
the best estimate of the total number of events, calculations were 
carried out— 

i. for the Centre as a whole (births and deaths) 
ii. for each Union Board separately; then these figures were 
combined (births and deaths) 
iii. for males and females separately for the Centre as a whole; 
then these figures were combined (deaths only) 
iv. for age groups by sex for the Centre as a whole; then the 
figures were combined (deaths only) 

In 1945 the total number of deaths as estimated by these four meth- 
ods were 2234, 2238, 2245, and 2418 respectively each with a standard 
error of approximately 70. In 1946 the number of deaths as estimated 
by the four methods were 1,696, 1,684, 1,698 and 1,765, each with a stand- 
ard error of approximately 40. The closeness of the first three estimates 
indicates that the chances of the registrar and the interviewer detecting 
an event did not vary to any marked extent between Union Boards 
and the sexes. The increase obtained by the fourth method clearly indi- 
cates that the chances of the interviewer and the registrar detecting a 
death may differ considerably with the age of the dead person. Positive 
correlation is indicated. 

Higher percentages of deaths in the younger age-groups were missed 
by both R and I as compared with adult age groups. The proportion 
missed also show a tendency to increase in the more advanced age- 
groups. It would be interesting to ascertain whether the estimate could 
be increased still further by finer subdivision of age groups or by sub- 
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division in regard to other characteristics within each group, but no 
further analyses were conducted. 

As for births, the total number estimated from the data of the entire 
Centre was 2908 for 1945 and 3744 for 1946. Separate estimation for 
the four Union Boards when totalled yields 2915 and 3775 for the same 
years. It is to be noted that while the latter figures are the higher of the 
two, the figure for 1945 is higher by only 1/7th of the standard error 
and the figure for 1946 is higher by a whole standard error. 

The highest figure obtained by breaking the population into groups 
in various ways, and adding the estimated number of events, is to be 
accepted as nearest the true figure. The nonresident institutional 
events, which were left out of consideration may be added in to get the 
total number of events occurring in the area. 

Estimation of rates and incompleteness of registration. For computing 
birth and death rates over an area, the population base is furnished by 
the house-to-house canvass. The total number of correct entries in the 
R-list judged against the total estimated number of events, measures 
the extent of registration. Tables II and III show the results obtained 
for rates and for completeness of registration. 


TABLE II 
BIRTH AND DEATH RATES IN 1945 AND 1946, SINGUR HEALTH CENTRE 









































1945 1946 
Standard Standard 
Rate error Rate error 
Birth rate per 1,000 population 46.1 0.8 59.8 1.0 
Death rate per 1,000 population 37.7 1.2 27.5 0.7 
Specific death rate (males) 36.4 1.6 27.3 1.0 
Specific death rate (females) 39.2 $3 27.8 1.0 
TABLE III 
PERCENTAGE OF BIRTH AND DEATH REGISTRATION DURING 1945 AND 1946 
Birth registration Death registration 
Union board 
1945 1946 1945 1946 
Singur 60.4-67.9 70.9-77.1 38.1-46.9 42.0-49.1 
Balarambati 51.5-55.8 53 .3-57 .8 45.8-55.9 50.8-58.0 
Bora 53 .1-61.3 56 .0-66 .0 54.9-66.5 52 .6-63 .4 
Begumpur 47 .4-50.3 61.3-64.7 42 .6-46.4 44 .9-48.1 











Note (1) The range is due to non-verified entries on the R-list. 
Note (2) The figures are subject to a standard error of about 3 per cent. 
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One comment may be made in regard to the birth rate for 1946, which 
appears to be very high. Possible explanation may be the improved 
economic situation after the famine of 1943, and demobilization. An- 
other possible explanation is failure of the investigator to establish the 
identity of entries in the R and I lists, but if this were so, it should be 
more apparent for 1945, which it is not, as the birth rates for 1945 are 
much lower. An improbable explanation is that each Union Board is 
composed of extremely heterogeneous sections displaying negative 
correlation between the probabilities of detection of events by the 
Registrar and the interviewers. 

Another comment should be made. The completeness of registration, 
recorded in Table III, is based on the number of correct entries in the 
R-list judged against the estimated total number of events. Official 
published rates in all countries are based on the total number of regis- 
trars’ entries, correct plus incorrect, and the usual practice of inflating 
official rates to correct for incompleteness of registration yields spurious 
results: the rates are already partly inflated owing to incorrect entries. 
Proper inflation (correction of rates) is possible only by comparing the 
registration lists with the results of a population survey and making 
estimates of the total number of events and the proportion of incorrect 
entries in the registration lists. 

The precision of estimated number of events. From the fact that the 
coefficient of variation of a total estimated number of events is 
V 192/N pipe, it will be seen that the lower the efficiency of detection of 
an event on either the I or R-lists (p: or pz), the greater the standard 
error of the total. In this enquiry, in spite of the fact that local people 
were hired and trained especially for this work, the efficiency of the 
interviewing was not of high order: only 67.2% of the births in 1946 
and 52.8% in 1946 were detected by the interviewers. The correspond- 
ing percentages for deaths were 50.7 and 32.3. Methods of improving 
the performance of the interviewers must be sought, and it appears 
that the interval of time to be covered by the survey must not extend 
too far back. 

It is highly important to bear in mind that regardless of the inter- 
viewers’ performance, the method proposed here for estimating the 
total number of, N, events is not subject to bias,* but poor performance 
does increase the error of the estimate of N. It also increases the stand- 
ard error of the estimate of the registrars’ performance. 

The coefficient of variation is also influenced by N. It is important 
to note that N in the formula refers to any total—not just a total over 





4 In making this statement the case of p: (or p:) =O is considered trivial and is excluded. 
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an area, but a total for any subgroup, such as an age or sex classes for 
which an estimate is prepared. For the area and sex classes that were 
used here, the standard errors of the estimated totals varied from 1 to 
10%. Over a larger area, or over broader classes, the coefficients of 
variation would be reduced by the presence of the factor \/N in the 
denominator. 

Costs. A few words regarding the cost of this particular enquiry may 
be helpful in planning future enquiries. The cost of the field-work, in- 
cluding salaries and overhead charges, amounted to Rs. 4,000. The 
cost of tabulation and analysis amounted to Rs. 1,500. The total cost 
was thus Rs. 5,500 or about 14 annas (2 U.S. cents) per capita in the 
area of enquiry. For various reasons (this being a pilot study and a com- 
plete listing of the population being desirable for other reasons), the 
entire population was covered without the introduction of sampling. 
In designing an enquiry for a larger area such as a province or even a 
district, sampling would be used. 

For each area in the sample there can be calculated the total number 
of events and the rate: also the efficiency p; of the registrar’s perform- 
ance. For each sample-area, supposedly completely canvassed (no 
sub-sampling) there will be an error in estimating either the rate or the 
registrar’s performance. The coefficient of variation in the rate will be 
the expression already given earlier, viz. /qiq2/Npip2. Likewise, the 
coefficient of variation of the estimate of p;, the registrar’s performance, 


18 
4/ qi N-C-N;z 
(C + No)~r N-1 


Each symbol refers to the particular area covered. These errors are not 
erased by taking a complete canvass. (As a matter of fact, the particu- 
lar enquiry described here was a complete canvass, yet subject to these 
errors. ) 

When sampling is introduced to study a whole District, the estima- 
tion of the total number of events, the rates, and the over-all efficiency 
of registration will be made by combining the data from a number of 
sample-areas. An additional error is then introduced for a District as a 
whole because of variability between the sample areas. The variability 
between the rates of the individual sample-areas may be much smaller 
than the variability between their total events, as it is usually difficult 
to define sample-areas of equal populations. It follows that usually a 
much smaller sample will provide a standard error of (e.g.) 4 per cent 














114 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1949 


in an over-all rate for a District than is required to provide the same 
precision in the total number of events. 

The cost of attaining (e.g.) a 4% error of sampling will depend on the 
particular design of sample that is used; and the design in turn will, for 
greatest economy, depend on the density and distribution of the popu- 
lation, on the variability of the birth and death rates over the area for 
which estimates are to be prepared, on the costs of purchasing or pre- 
paring maps and lists by which the sampling procedure may be for- 
mulated, on the quality of personnel available to carry out the work, 
etc. 

As a general principle, applicable to large populations, so far as the 
errors of sampling are concerned, the total number of cases (i.e., the 
total number of people, households, areas, or whatever unit constitutes 
the elements of sampling) to be included in the survey depends almost 
entirely on the precision of sampling that is desired in the estimation of 
the total number of events, or in the rate (whichever is the aim of the 
survey) and hardly at all on the total number of inhabitants in the area to 
be covered.* 

In India, the birth and death rates should be estimated at least by 
the District (roughly 1 to 2 million inhabitants), and for smaller areas 
if funds would permit. Roughly speaking, to attain an over-all 
standard error of 5% (a reasonable aim for the present), the cost of a 
survey will run between Rs. 10,000 and 15,000 for a district. 


Additional ¢nformation provided. A survey of this type also provides® 
valuable ancillary information regarding other characteristics of the 
population such as size of family, age and sex distribution, marital 
status, occupation and industry, specific fertility rates, gross and net 
reproduction rates, and other information, but the list cannot be ex- 
tended indefinitely because the interest of the field workers must not be 
dissipated too far from the main aims of the survey. 


APPENDIX 
THE STANDARD ERROR OF N 
An approximate value for the standard error of N. 


: CP eee 








4 It-is presumed in this statement that the physical facilities for sampling (maps, lists, personnel, 
payment, etc.) are about the same over all parts of the area to be covered. 

5’ As a matter of fact, the surveys reported have provided most of these additional items, and the 
cost mentioned includes them, 
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can be obtained by the application of the formula that the variance 
Vf(x) of a function f(x) of z is approximately given by 


fs) 2 
Vila) = (=) V(2) 
Ox E 


where ()z denotes the substitution of the expected values for x 
appearing inside the bracket after differentiation, and V(x) denotes the 
variance of x. 

If C+Ni, C+Ne and WN are fixed, it is known that the expected 
value E(C) and the variance V(C) of C are given respectively by 








E(C) = Npipz 
and 
V(C) = Nopigupege 
where 
C+wN C+ WN. 
pi = V - Po = and mt+u=mt+o@=l. 


Under the same conditions, the variance V(N) of N is 
<< 1 
V(M) = (C+ NDC + wytv(=) 
which by the application of the formula given above reduces to 
NqgQ 
Pip2 





V(N) = 


The standard error of N is therefore 


he / Nag 
On = 
Pipe 





approximately. 








EVALUATION OF PARAMETERS IN THE 
GOMPERTZ AND MAKEHAM EQUATIONS 


J. F. BRENNAN 


This paper describes a technique for determining the mortal- 
ity characteristics of physical property through the use of ac- 
counting records of plant balances and yearly additions. 
Where time is not available for extensive actuarial research, 
the method produces results within tolerable limits of accu- 
racy. Its limitations are pointed out. 


HE ASSEMBLY OF the statistics needed for an actuarial analysis of 

physical plant is a tedious and expensive task. Basic records are not 
always adequate for this purpose, a circumstance which poses a for- 
midable research problem. The technique described herein was devised 
in an effort to bypass this obstacle. The application of the method re- 
quires only a money record of plant balances and of gross additions over 
a period of years. 

Evidently the plant balance at any time is the summation of the 
survivors from the gross additions of all previous years. If, using this 
principle, we attempt a direct determination of the parameters for a 
Gompertz or Makeham equation, we immediately encounter the diffi- 
culty of making summations, because of the compound exponential 
property of these functions. We may, however, expand these functions 
into convergent power series and by disregarding terms beyond the 
second or third power, often find an acceptable solution. In the de- 
velopment of the theory in the following paragraphs, I take the case of 
the Gompertz equation and its power series, limited to the square term. 
The extension of the technique to the Makeham equation and to higher 
terms of the series expansion will be obvious. 

The specific problem is to determine, on the basis of a set of data, 
the parameters of a Gompertz equation: 


y = kg" 


where t= age and y =survivors at age ¢. 
The data are given in the form: 


Pi, = Yu 

P2 = Yet+ Yu 

P; = Yigs + Yo + Yu 

Ps = Yu t+ Yos + Yo + Ya 


where P, is the plant balance at the end of year four and Y%3 is the 
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amount of plant remaining from the gross addition of year two, 3 years 
after installation, etc. 

For simplicity in the development, there will be imposed the condi- 
tion that at age zero, the survivor curve passes through point (0, 1), 
so that we show at each age the fraction surviving. Thus we write: 


1 
ee i 
g 
Expanding y in a Maclaurin series gives: 
1 1 
y=1+ OC + =, GONG + 1)?? +o +3G+1)@4+---. 


where G=log, g and C=log, c 

Now if, as found in some applications, the values of G and C are such 
that over the range Yy to Y:, the sum of the cubic and higher order 
terms is small compared to the sum of the first and second order terms, 
then for that range the Gompertz equation may be approximated by 


y = 1 — at — bi’. 
If coefficients a and b are determined from the data, the parameters 
of the Gompertz Equation are given by: 


a? 
Om + ob 
a 
(1) 
a 
C=-—. 
G 
To deteimine the values of a and b we proceed as follows: Let 
(2) Yz.n-z. = 1.00 — a(n — x) — b(n — 2)? 


where Y,,,-2,.=the survivors in year n, as a decimal part of the in- 
stallations made in year 2, (i.e. (n—2) years after 
installation), 
x=year of installation, 
n=a particular year, subsequent to year 2, 
(n—z) =age at year n, 
a & b are constants to be determined. 
If we sum up the survivors in year n out of the installations of all 
previous years, we have 


(3) Pass = 5 Aa[1 — a(n — 2) — W(n — 2)? 


beg! 
*, 
3 
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where A, =gross additions made in year x 
P41:=capitai in service (i.e. plant balance) at the end of the 
nth year, (beginning of year n+1). 
Expanding (3) we obtain 
Pus = 2) A—a[n> A — > Az] 
— b[n? >> A — 2n>> Ax + DY Az?) 


or 


a[n>. A — >> Ax] + b[n?>> A — 2n>> Ax + YD Az?) 
= >» A —_ Pri. 

Now evidently values may be fixed for a and b by choosing any two 
different years n, and solving simultaneously the two resulting equa- 
tions (4). Thus many different sets of values for a and b can be obtained 
by varying the selection of the two n’s. A least squares solution based 


on a range of values of n is indicated. 
For convenience let 


U, = [n)>/ A — > Az], 
V, = [n?>> A — 2n), Ax + > Az’), 
Z.= [DA — Pas]. 

Then Equation (4) may be written 

(5) Zn = aU, + bV;. 


From a set of equations (5) the best values of a and b (“best” in the 
lease squares sense) are found from the normal equations: 


(‘= U?+b>, UV = > UZ, 
a>, UV+b>V?= > VZ, 

where the summations are carried out over such a range of values of 
n as is appropriate to the approximation. 


With the numerical values of a and b obtained by solving equations 
(6) the survivor equation (2) may be conveniently written 


(7) Y; = 1 — at — b?? 


(4) 


(6) 


where t=age =(n—2z), Y;= survivors at age ¢ as a decimal part of the 
additions made in year 0(Yo=:1.0) 

The values of G and C (and from them g and c) may now be obtained 
by substituting a and b in equations (1). The resulting Gompertz 
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curve and the parabola, equation (7), should then be plotted on the 
same graph. It may be found that the two curves exhibit important 
divergence in the higher years of the range employed in the solution. 
This indicates that the third order and higher terms of the power series 
should not have been neglected in Equation (2). To include higher 
order terms, however, would multiply the work of the solution tre- 
mendously, as will be obvious from the summations of equation (4). 

A satisfactory way out of this difficulty consists in calculating a 
number of points on the curve, Equation (7), and to them fitting a 
Gompertz curve by King’s method.! The range of points employed in 
this procedure should be approximately the same as that used in deriv- 
ing constants a and b. 

Chart I illustrates the processes heretofore discussed. The two 
curves shown thereon are calculated from the same actual book record. 
First the constants in the parabola were fixed by the method proposed 
herein. Two Gompertz equations were then found: one by substituting 
the parabolic constants in Equations (1); and the other by the log- 
arithmic method of King. The latter method appears to give the better 
interpretation of the data, since, as seen on Chart I, it produces a curve 
coincident with the parabola over the range of the data employed. 

How well the Gompertz equation, found by King’s logarithmic 
method, defines the mortality characteristics of plant, may be judged 
from the results given in Table I, which shows how nearly the theo- 
retical capital in service, calculated from the derived Gompertz equa- 
tion, matches up with the actual book record. The example used is 
based on the same actual data as underlies Chart I and represents fixed 
capital investment in overhead electric conductors, to which yearly 
additions were erratic, i.e., not correlated with the amount of capital in 
service. The test is, therefore, especially significant. The standard error 
of estimate calculated from Table I is 7.7 or about 3 of 1% of capital in 
service during the last 5 years of the table. 

A further test of this technique was made on a group composed of 
thousands of homogeneous units of equipment (gas meters). Having 
previously, by the conventional actuarial process, derived a Gompertz 
survivor curve for this group, a sample accounting record of gross addi- 
tions was assumed and a plant balance record calculated from the 
known Gompertz curve, covering a period of 30 years. The process de- 
scribed herein was then applied to the sample, King’s method being 
used for the extrapolations. The results are depicted on Chart II. 





1See “Textbook of the Institute of Actuaries” or Winfrey “Statistical Analyses of Industrial 
Property Retirements” Bulletin 125, Iowa Engineering Experiment Station, Ames, Iowa. 
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TABLE I 
TEST OF DERIVED SURVIVOR CURVE 
Capital in service at beginning of year 
(000 omitted) Deviation 
Year (absolute value) 
Actual Theoretical 
1935 $2,817 $2,840 23 
1936 2,862 2,856 6 
1937 2,916 2,910 6 
1938 2,972 2,963 9 
1939 3,097 3,098 1 
1940 3,161 3,172 11 
1941 3,223 3,229 6 
1942 3,320 3,339 19 
1943 3,391 3,390 1 
1944 3,457 3,443 14 














The true average life is 30.4 years, while that derived by the proposed 
method is 30.2 years, giving an error of approximately six tenths of one 
per cent. This is a tolerable deviation in this type of problem. No doubt 
greater precision could have been achieved by the use of a cubic (or 
higher degree) equation in lieu of the second degree parabola, but the 
extra labor is not justified in this case. Some precision was sacrificed 
in rounding out the data to units of one thousand dollars. Greater re- 
finement is probably not warranted in the use of this process. 

In applying this method, one may be confronted with a reliable sta- 
tistical record covering a period of years, but may lack knowledge of 
the age distribution of the plant balance at the beginning of that period. 
Frequently in such cases a satisfactory solution can be made by esti- 
mating the average age of the beginning plant balance and treating it 
as a gross addition in the year corresponding to that age. Such a scheme 
has obvious frailty and should be employed only when no more logical 
process is possible. 

It is conceivable that the derived parabola may have a maximum at 
an age significantly greater than zero. In some applications also, it 
may be found that the parabola is concave upward. Such results indi- 
cate a lack of stability in mortality ratios in the period (band) of years 
employed. This condition is entrained by shifting retirement policies, 
changed economic conditions, lack of replacement material or replace- 
ments and additions made by substitution of materials or equipment 
having inherently different life characteristics. Sometimes these faults 
will vanish with the selection of a different band of years (range of n) 
for solving Equations (6). If such conditions persist, however, the 
method breaks down. 








ON THE “INFORMATION” LOST BY USING A t-TEST WHEN 
THE POPULATION VARIANCE IS KNOWN 


Joun E. WatsH# 
The RAND Corporation 


This note calls attention to the use of the power function 
as a means of determining how much “information” is lost 
by using some other test in place of the most powerful test 
of a given hypothesis. As an example of the method, the case 
of using a t-test for the mean of a normal population with 
known variance is analyzed. 


INTRODUCTION 


F TWO significance tests of the same hyy«thesis should happen to 

have the same power function, these tests would furnish the same 
amount of “information” about the hypothesis tested in the sense of 
the Neyman-Pearson theory of testing hypotheses. Of course, it is 
hardly to be expected that two different significance tests will have 
exactly the same power function. In some cases, however, two signifi- 
cance tests may have very nearly the same power function. Then the 
two tests are said to furnish approximately the same amount of “in- 
formation” concerning the hypothesis tested. 

If there exists a most powerful test for a given hypothesis, “informa- 
tion” (in the sense of the Ne. man-Pearson theory of testing hypothe- 
ses) will be lost by using some other test rather than this most powerful 
test. (For fixed sample size and significance level, a test is most power- 
ful if the values of its power function are greater than or equal to those 
of the power function of any other test of the same hypothesis for the 
particular alternative considered.) It may happen, however, that the 
most powerful test (at same significance level) has approximately the 
same power function as the given test if the most powerful test is based 
on a smaller sample size; i.e., a most powerful test based on m sample 
values furnishes approximately the same amount of “information” as 
the given test using n sample values (mn). Then it will be said that 
n-m sample values are “wasted” or “lost” by using the given test rather 
than the most powerful test. By convention, the value of m is allowed 
to assume non-integral values; the values of the power function of the 
most powerful test for non-integral m are found by interpolation from 
the power function values for integral m. This procedure furnishes an 
interpolated measure of the number of sample values “lost.” 

The above procedure could also be carried out in terms of operating 
characteristic functions rather than power functions. Since 
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(1) (Power Function) = 1 — (CC Function), 


however, the same value of m is obtained. 

The value of 100m/n% is called the power efficiency of the given sig- 
nificance test. A discussion of power efficiency which contains an exact 
definition of when two power functions are to be considered equivalent 
(in the sense of furnishing the same amount of “information”) is given 
in [1]. From (1), the definitions and remarks of [1] are equally applicable 
to the case in which OC functions are used instead of power functions. 

As an example of application of the above method, let us consider a 
s:zmple from a normal population with unknown mean and known 
variance. If it is desired to test the population mean with respect to a 
given constant value, the most powerful one-sided and symmetrical 
tests are based on the quantity 


(sample mean) — (given constant value) 





(2) (population standard deviation) 

Thus, if the Student t-test is used instead of (2), “information” will be 
lost. This note presents an approximate expression for the number of 
sample values “lost” for the cases of one-sided and symmetrical t-tests. 

The example analyzed has statistical interest in itself. Many statis- 
ticians have probably wondered how much information is lost when 
this situation occurs. One possible application of the result would be 
to help in deciding whether to use the t-test or test (2) with the popu- 
lation standard deviation estimated from past information. The final 
decision, of course, would also depend on the reliability of the estimate 
of the population standard deviation, the cost of taking observations, 
and perhaps other considerations. The complete formulation and analy- 
sis of this situation, however, is not considered to be a problem of this 
note. 

Situations similar to the example analyzed here were investigated 
by Neyman in [2] and by Fisher in [3]. Fisher’s results, however, are 
based on estimate rather than power function considerations. 

2. Results for example: Let n sample values 1, ---, 2, be drawn 
from a normal population with unknown mean yp and known variance 
a’. Let us consider tests of whether yu differs from a given constant value 
Mo Which are based on the t-statistic 


(Z — po)Vn(n — 1) 
i=-7 = ‘ 
V > (a — 2)? 
All the t-tests investigated in the note are of this type. 
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For one-sided t-tests at significance level a, or a symmetrical t-test 
at significance level 2a, it is found that approximately 


2K .’n/(n — 1) 


sample values are “wasted” by using a t-test rather than the cor- 
responding test of type (2). Here K. is the standardized normal de- 
viate exceeded with probability a; i.e., the function K, is defined by 


1 ey 
3 —— f ez = a, 
( V/ 24 Ka n 


The above approximation to the number of sample values “lost” is 
reasonably accurate for n=4 if a=5 per cent, n25 if a=2.5 per cent, 
n=6 if a=1 per cent, n=7 if a=0.5 per cent. The accuracy of the ap- 
proximation increases as n increases. 

If n is not tceo small, the above results can be roughly summarized 
by stating that }K,? sample values are “lost” by using a one-sided t- 
test at significance level a or a symmetrical t-test at significance level 
2a. Table I contains values of }K,” for a=5 per cent, 2.5 per cent, 1 
per cent, 0.5 per cent. 

TABLE I 


APPROX. NO. OF SAMPLE VALUES *WASTED” USING 
t-TEST WHEN VARIANCE KNOWN 























Significance Level het tiaas 
One-sided t- test Symmetrical t-test Sample Values “Wasted” 
5% 10% 1.4 
2.5% 5% 2.0 
1% 2% 2.7 
0.5% 1% 3.3 











3. Derivations: Let us consider the one-sided t-test of u<po at sig- 
nificance level a and based on a sample of size n. Using a modification 
of the normal approximation given in [4], it is found that the power 
function values e of the t-test are approximately determined by the 
relation 

££, « 2 (1 

a//n 


where the K; function is defined by (3). This approximation to the 
power function is reasonably accurate for n=4 if a=5 per cent, n2=5 


— Ka?/2(n — 1)]1, 
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if a=2.5 per cent, n2=6 if a=1 per cent, n=7 if a=0.5 per cent. The 
accuracy of the approximation increases with n. 

Now consider the one-sided type (2) test of u<po at significance level 
a and based on a sample of size m. The power function values e’ of this 
test are exactly determined by the relation 


(uo — #) 
a/\/m 


Hence the two one-sided tests will have approximately the same 
power function if m is chosen so that 


K/ = Ka- 


» neg . eg | oie 
K,= a = = K, a” K.2/2(n — 1)]*/2, 


n— m = $K,2n/(n — 1). 


i.e., so that 


Thus approximately }K,? n/(n—1) sample values are “wasted” if 
the one-sided t-test of u<jo at significance level a and based on n 
sample values is used rather than the corresponding type (2) test. 

By symmetry, approximately }K,? n/(n—1) sample values are also 
“lost” by using the one-sided t-test of u>po at significance level a and 
based on a sample of size n. 

Now the power function of the symmetrical t-test of uA~po at sig- 
nificance level 2a and based on n sampie values equals the sum of this 
power function of the one-sided t-test of u<po with significance level 
a and sample size n plus the power function of the one-sided t-test of 
>po at significance level a and sample size n. Likewise the power 
function of the symmetrical type (2) test of uo at significance level 
2a and based on a sample of size m equals the sum of the power func- 
tions of the two one-sided type (2) tests (of u<po and u>wo) at signifi- 
cance level a and sample size m. Thus approximately 34K,? n/(n—1) 
sample values are “wasted” by using a symmetrical t-test of u~p, at 
significance level 2a and based on a sample of size n. 
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WESLEY CLAIR MITCHELL, 1874-1948 


AN APPRECIATION 


The death of Wesley Clair Mitchell brought to an end a lifetime 
of formative research, inspired scholarship, and earnest, continu- 
ous effort to apply scientific methods to social and economic prob- 
lems. The end was untimely not merely because it always comes too 
soon for those few useful and lovable members of mankind of whom 
Dr. Mitchell was such an outstanding example. It was even more un- 
timely because to the very end Dr. Mitchell retained the keenness of 
mind, the breadth of vision, the hospitality to new and pseudo-new 
ideas, and the kindliness to their often overconfident bearers—rare 
qualities even among scholars. To those of us who knew him and had 
the privilege of meeting him often, he seemed ageless and timeless. It 
is still difficult to realize that he is gone and will not be here to listen 
to our enthusiasms and complaints, to comment wisely and always with 
a charming humor upon some new quirk of the human mind, and to set 
before impatient younger generations further examples of broad 
scholarship and of respect for data and problems. 

In these few notes it is perhaps most appropriate to stress Dr. 
Mitchell’s work in statistics. Of the many who are familiar with his 
writings in recent decades only a few may realize how consistent was 
his interest and how continuous his research in the field of statistics. As 
a young graduate student at the University of Chicago at the end of the 
1890’s, his interest aroused by the monetary questions of the day, Dr. 
Mitchell was already contributing to the enrichment of quantitative 
knowledge by a series of articles on prices and by his work on the infla- 
tion experience during the Civil War—work that eventually resulted in 
two monumental treatises (published in 1903 and 1908). Upon comple- 
tion of his graduate training at Chicago (with one year in Germany and 
Austria), Dr. Mitchell spent 1899-1900 at the Bureau of the Census, 
when Allyn Young and Walter F. Wilcox were there. This early com- 
bination of fruitful use of data in the study of economic problems with 
active interest in public agencies responsible for social and economic 
statistics set a precedent consistently followed throughout his lifetime. 
As work on currency and monetary problems gradually gave way to 
the broader studies on business cycles, Dr. Mitchell continued to main- 
tain his active interest in and scrutiny of the basic data. There followed 
articles on the BLS index numbers of wages (QJ£E, 1911), on new bank- 
ing measures (J PE, 1914), and on possible improvements in the sta- 
tistical output of federal bureaus (Quart. Pub., ASA, 1915). From his 
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earliest productive years to the publication of that classical treatise on 
Business Cycles (1913) there was a continuous interplay of analysis, at- 
tempts at gathering and improving economic data, and efforts to raise 
the quality of basic information available to scholars and to the intel- 
ligent public at large. 

This concern with bringing measurable facts to bear upon basic eco- 
nomic problems and with the need for critical scrutiny and evaluation 
of data made available by public agencies persisted throughout Dr. 
Mitchell’s life. In 1915, barely two years after publications of Business 
Cycles, the BLS monograph on index numbers of wholesale prices ap- 
peared. This less well-known study, which was reprinted in 1921—an 
unusual distinction for a government report—is also a typical example 
of Dr. Mitchell’s scholarship and approach—in the care with which the 
efforts of earlier scholars in the field are reviewed and utilized, in the 
breadth with which the problem is conceived, in the scrupulous atten- 
tion paid to the characteristics of the available primary information, 
in the happy blend of insight and common sense with which the answers 
are provided and indeed the very questions formulated. 

During the country’s active participation in World War I, Dr. 
Mitchell served as Chief of the Price Section of the War Industries 
Board. Since he was always quite reticent about this period, one may 
surmise that it was not a happy one—for reasons which many scholars 
who passed through a similar experience in World War II can well 
understand. The pressure of urgent problems, the need for decisions 
made upon all too slim a factual basis, the tug and pull of various group 
and personal interests, hardly provided an atmosphere satisfactory to a 
scholar bent upon operating with wide and thoroughly weighed evi- 
dence. Yet several important and valuable results can reasonably be 
attributed to this experience. One was a clearer appreciation of the 
difficulties in the assembling of data and of research under government 
auspices—with some prescient suggestions for change, foreshadowing 
future reforms, made in Dr. Mitchell’s Presidential Address to this 
Association in 1918 (see JASA, March 1919). Another was the series 
of monographs on the history of prices during the war, of which two 
volumes appeared under Dr. Mitchell’s name. But perhaps the most 
important result of his war experience was the conviction that neither 
the university nor the government sufficed as loci of objective study of 
economic problems; and that a research institution, combining the 
continuity, theoretical interests, and the broad approach of the aca- 
demic scholar with attention to quantitative data and the more real- 
istic approach of government research, would plug a crucial gap and fill 
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a badly needed want. It was this conviction that provided the initial 
impetus to Dr. Mitchell and some of his wartime colleagues in the or- 
ganization of the National Bureau of Economic Research in 1920. 

Dr. Mitchell’s own research since the early 1920’s is closely associ- 
ated with the National Bureau, of which he served as research director 
until 1946 and as an active member of the staff throughout and until 
the last. He headed the team that made the basic study of national 
income in this country in 1922 and set the pattern for work in the field 
that has grown apace ever since. It was his work on business cycles, in 
the increasingly broad conception of it as a pattern of change in the 
whole economy, that provided the central theme for all the work of the 
National Bureau through the almost three decades of its existence. It 
was his inspiration that held the National Bureau to standards that it 
endeavored to maintain; that attracted to it a group of people who 
combined theoretical] interests with a zeal for established and testable 
evidence; and that kept the National Bureau from the temptation to 
take hard and fast positions on current and apparently pressing issues 
that were not warranted by the existing evidence. It was under the 
auspices of the National Bureau that Dr. Mitchell published his in- 
troduction to a new study of business cycles. Business Cycles: The Prob- 
lem and Its Setting (1927) and the treatise on Measuring Business Cycles 
(jointly with Arthur F. Burns, in 1946). A report dealing with stable 
and variant characteristics of business cycles, now in preparation for 
publication by the National Bureau, engaged his attention during the 
‘ast three years of his life. 

Impressive as is this list of contributions during the last quarter 
century, it is incomplete in several respects. Dr. Mitchell was part 
author of many of the National Bureau publications, either as a direct 
contributor (to Business Cycles and Unemployment, Recent Economic 
Changes), or by his assistance rendered in review and criticism, or by 
the example and inspiration set by his own work. He served as guide 
and counselor to many other research organizations and projects—as 
chairman of the Research Committee on Social Trends (1929-33), as a 
member of the National Planning Board and of the National Re- 
sources Board (1933-35), as a member of the Social Science Research 
Council (since 1927)—to list but a few. The last service he rendered 
the government was as chairman of the technical committee set up by 
the National Labor Board to review the controversy over the BLS cost 
of living index (1943-44). And those who knew him were all too aware 
of how much of his time and effort was spent in counsel and guidance, 
kindly and modestly extended to all scholars, young and old, who were 
seeking it in increasing numbers. 
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The enriching influence of this lifetime of scholarship on research in 
the social sciences, on the teaching of economics and related subjects, 
and on public policy is well recognized and hardly requires demonstra- 
tion. The ever widening use of statistical data and tools in the analysis 
of economic problems, the emphasis on the concrete institutional and 
historical framework within which societies live and function, the more 
scrupulous distinction between a recorded observation and plausible 
assumption, are all comparatively recent trends in the economic and 
social disciplines in this country and elsewhere. To this quickening of 
the searching spirit in the study of society, Dr. Mitchell’s own investi- 
gations and those directed by him, were a major impetus. It should 
also be noted, for the benefit of those who are concerned with direct 
and practical utility, that the return to society from such effort is far 
greater than may appear on the surface. Its value to society is not 
only the obvious one of making possible more intelligent solutions of 
social and economic problems because more is known about the func- 
tioning of the economy—a clear illustration of this was provided in the 
work of our economic agencies in World War II. Its even greater, if 
less obvious, value lies in the spread of the spirit of inquiry and of the 
respect for facts, which impose desirable limits on the “mutable Minds, 
Opinions, Appetites and Passions of particular Men.” 

Yet the study of society through the use of statistical and other 
testable evidence is far from an easy task. Those who have wrestled 
with the complexity and variability of observable economic life and 
with the imperfect and treacherous data available on social phenomena, 
know the courage, patience, and sheer moral stamina required in this 
struggle and the unusual] capacities for organization, analysis, and syn- 
thesis needed to bring order out of chaos. It would be useless, and per- 
haps impertinent, to inquire by what turn of the wheel that determines 
heredity and environment was Dr. Mitchell endowed so richly with 
all these qualities. But it is important to indicate, as well as one can, 
the leading ideas and the broad attitudes that assisted him throughout 
his lifetime. 

These ideas or attitudes may be briefly stated under three heads. 
First was the conviction that the human mind is infinitely productive 
of hypotheses or models and that, from its rich endowment, it proceeds 
to originate them in profusion—regardless of the extent to which they 
are anchored in testable evidence. This conception led Dr. Mitchell to 
approach the products of the human mind with both respect and cau- 
tion—respect for the rich insight and small modicum of experience 
that they may embody, and caution in accepting the wide interpreta- 
tion and inference that are almost inevitably attached to the products. 
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There is a revealing discussion of this attitude in Dr. Mitchell’s letter 
to J. M. Clark (see Methods in Social Science, edited by Stuart A. Rice, 
Chicago, 1931, pp. 674-80). It was an attitude particularly helpful in 
the field of social study in which group interests and passions tend un- 
consciously to color the hypotheses or models originated with such ease 
and with such a claim of finality. 

Second, and equally important, was the dominant notion of inter- 
relation in space and continuity in time as the basic characteristics of 
social and economic life. It is significant that the whole line of evolution 
in Dr. Mitchell’s work is from prices (under the specific angle of in- 
flation), to business cycles, and to the study of economic change at 
large. But this semblance of evolution is deceptive, since in his early 
investigations Dr. Mitchell already fully recognized that the study of 
any one part is in effect the study of the whole from a particular angle. 
This basic idea, in combination with the critical attitude toward 
man’s theorizing, resulted in a natural emphasis on testable evidence 
and on an approach that, however meticulous with regard to the parts, 
never lost sight of the whole. The first attitude made Dr. Mitchell an 
empiricist; the second made him a synthesizer in the best sense of the 
word. The first helped him to resist the temptation to escape into the 
quiet haven of imaginative models and ‘caeteris paribus’es; the second 
helped him to avoid refuge in the details of empirical work and kept 
him from indulging in the perfectionist’s delight of whittling at 
minutiae until the Greek Kalends. 

But there was a third and perhaps most basic idea—that there was 
some order in the ceaseless change and variance of economic phe- 
nomena; and that the patient building up of testable quantitative data, 
accompanied by the cautious and critical use of theories as hypotheses, 
might reveal the invariant elements. It is this idea that illuminated 
Dr. Mitchell’s work with a steady glow, that served as a powerful 
magnet around which the detailed findings in his treatises arranged 
themselves in a comprehensible pattern. And it is the quest for this 
underlying order that provided the powerful drive in this long life of 
search and research—in the belief that as the pattern is gradually re- 
vealed and its concrete manifestations recognized, it will be accepted 
by human intelligence as the basis for action on social and economic 
problems. 

Wesley Mitchell would have been the first to protest against such 
analysis in what he would consider grandiloquent terms: he was a 
modest and humble man—with a humility that, like all genuine humil- 
ity, verged on pride. And I am writing these lines reluctantly. My only 











WESLEY CLAIR MITCHELL 131 


justification is that it is important to realize what guiding ideas were 
helpful in a lifetime of fruitful and fundamental scholarship. It is im- 
portant to recognize how strong today is the temptation to withdraw 
into the security of imaginary models, only distantly relevant to his- 
torical reality—regardless of how mathematically elaborate such 
models may be. It is important to see how ever present is the opposite 
temptation—to elaborate and check details without concern as to their 
place in the broader framework. And while these two lines of intellec- 
tual pursuit are moderately useful in the develepment of science, the 
third direction, equally tempting—to give up hope of finding any in- 
tellectual order and to resolve the problem by withdrawal into esthetic 
fancy or intellectual cynicism—can be of negative value alone. Dr. 
Mitchell’s life is an inspiring demonstration of how effectively such 
temptations can be combatted, and how the spirit of objective inquiry 
can yield rich results in the study of human society. 

To those who knew him well and to those who knew him slightly 
the passing of Wesley Mitchell is a great and numbing loss. These 
notes can give no idea of his personality, the quiet and often playful 
wit of his conversation, the genuineness of his moral seriousness, the 
consistent drive of his interests, the warmhearted attitude to and the 
broad tolerance of fellow men. His contributions to our knowledge of 
social phenomena, to the data on the basis of which more intelligent 
policy decisions are possible, to the training of a large group of scholars 
in the field, will stand; and, one may hope, will provide a foundation 
upon which work will be carried forward. But the personal loss is ir- 
retrievable, and we are all the poorer for it. 

Stmon KvzNeEtTs 
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Fraction-Defective Charts for Quality Control. British Standards Institution. 
British Standard 1313: 1947. London S.W. 1: British Standards Institution 
(28 Victoria St.), 1947. <- 40. 6s. Paper. (New York 17: American Standards 
Association [70 East 45th t.] $2.25.) 


Review BY ALBERT H. BOWKER 
Assistant Professor of Statistics, Stanford University 
Stanford, California 


S THE title implies, this standard describes control charts for fraction 
defective and is a revision and extension of a small part of British 
Standard 600R:1942 Quality Control Charts reviewed by Harold A. Freeman 
in this JourNaL (Sept. 1945, p. 386). The previous standard discusses, in 
addition to fraction defective, charts for mean, standard deviation, and sev- 
eral other statistics computed from continuous measurements. Further, it 
utilizes more technical knowledge of statistics than the present pamphlet, 
which emphasizes the applied side. 

The limitation of the subject matter to the fraction defective chart, the 
omission of a discussion of the statistical principles behind the chart, and 
the internal organization of the pamphlet are designed to facilitate initial 
application of the method. The first section, entitled “The First Control 
Chart,” suggests the application of a simple chart in a rigidly prescribed way. 
It recommends the selection of a product containing about 7% or 8% de- 
fective, using samples of twenty, sampling about 5% of the product, basing 
the process average on 25 samples, and using an upper limit exceeded with 
probability .005. Later sections discuss possible variation in amount of 
sampling, size of sampling, sampling interval, and probability limit. Other 
types of charts are discussed, including control charts based on a given 
standard rather than on an empirically-determined process average; two-way 
control charts, in which a separate control chart is kept for the per cent of 
items which exceed the upper limit and those which fall below the lower 
limit of an allowable range; and compressed limit charts for use when the 
fraction defective is very small. In this latter case, the control chart is based 
on the number of defectives outside gauge limits more stringent than the 
specification limits. 

The discussion of the first control chart assumes that the product to which 
the chart is applied is already in control and discusses appropriate action 
when an occasional point falls on or beyond the control limit. A considerable 
number of cases have been reported in American literature in which the 
initial application of control charts leads to the discovery that the process 
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is badly out of control, with, in some cases, a majority of the points falling 
outside the control limits. Statistical control is apparently achieved only 
after a lengthy study and modification of the process. A description of this 
phenomenon might keep initial users from becoming discouraged if their in- 
dividual sample qualities fail to cluster neatly around the process average, 
as in the examples provided here. 

The discussion of control limits based on a given standard assumes the 
specification of a maximum permissible average level for defectives. Control 
limits are found by treating the specified percentage defective as the process 
average. It is clear that, if articles are produced with quality equal to this 
specified average, only an occasional point will be outside the control limits. 
Indeed, the presented quality would have to be a great deal worse than this 
maximum permissible average level before we could state with high prob- 
ability that an out-of-control point would be obtained. Thus, the terminol- 
ogy “maximum permissible” is somewhat confusing. 

In all the examples in the pamphlet, only upper limits on fraction de- 
fective are included. Customary American practice is to use both upper and 
lower limits. However, the reader is advised to investigate the cause of a 
consistent run below the process average. Another departure from common 
American practice is the use of probability limits, as opposed to 2¢ or 3¢ 
limits. Further, the expected number of defects per sample is less than some 
American authorities recommend or imply. 

The pamphlet is well done and has several desirable features. The direc- 
tions for setting up charts are very clear; the use of symbols has been almost 
entirely avoided; the instructions are reproduced conveniently in sample in- 
struction sheets at the end of the pamphlet; there are several examples with 
practical advice; and there are references for further study. 

The reviewer agrees with the review cited in the first paragraph, which 
concludes: “It is possible that this particular job has now been done well 
enough. The reviewer would welcome a pamphlet on the statistical theory 
on which quality control rests for this is not quite as obvious and ironclad 
as these excellent applied publications make it out to be.” 


Principles of Counting and Probability. J. C. Abbott (Associate Professor of 
Mathematics), and 7. J. Benac (Associate Professor of Mathematics). (United 
States Naval Academy, Annapolis, Md.) Annapolis, Md.: U.S. Naval Institute, 
1947. Pp. iii, 40. Paper. $1.00. 


REVIEW BY HERBERT SOLOMON 
Air Intelligence Specialist, Air Intelligence Division 
Headquarters, United States Air Force, 
Washington 25, D.C. 


HIS booklet is intended primarily for students in naval science, particu- 
larly naval gunnery. However, the illustrations and exercises are directly 
analogous to those encountered in aircraft gunnery and bombing procedures. 
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The booklet is divided into two chapters: I, “Principles of Counting” and 
II, “Probability.” Answers are given at the end of the text to the 282 prob- 
lems posed at intervals in the booklet. An attempt at every fourth exercise 
revealed no irregularities in the published answers. 

Chapter I has some faint rumblings of set theory but mainly presents, in 
a very brief manner, material on permutations and combinations which can 
be found in any of the standard textbooks. Chapter II presents some of the 
fundamental theorems of probability in a simple non-rigorous manner which 
is, no doubt, exactly the intention of the authors. This lack of rigor will of 
course present no difficulties in working out the exercises or understanding 
the illustrations. The only probability distribution discussed is the binomial 
distribution and very little of its characteristics are studied. The multi- 
nomial distribution is left to the exercises. A noticeable omission is the normal 
distribution which plays a very important part in fire control studies in , 
military and naval science. 

As mentioned before, this booklet is designed for a rather special group, 
but except for its illustrations and exercises the material contained can also 
be found in many easily accessible books on algebra and mathematical prob- 
ability. 


Business Cycles and Forecasting, Third Edition. Elmer Clark Bratt (Professor 
of Economics, Lehigh University, Bethlehem, Pa.). Chicago 4, Ill.: Richard D. 
Irwin, Inc. (332 South Michigan Ave.), 1948. Pp. xii, 585. $6.00. 


REVIEW BY WILLIAM A. SPURR 
Professor of Business Statistics, Stanford University 
Stanford, California 


HIs book is the first satisfactory general survey work published in its 

field since the war. Professor Bratt has condensed and considerably re- 
vised his last edition of 1940, which had suffered the rapid obsolescence 
typical of this period. 

Like its predecessors, this edition begins with a brief treatment of season- 
ality and long term trends, and then covers the whole gamut of business 
cycles—their measurement, causes, theories, history, barometers, projection 
methods and proposals for stabilization. A postwar book of such scope has 
very real value for both the student and the business man. 

The topic which has required the most complete revision since 1940 is that 
of business indicators or “barometers” (Chaps. 15-17). The concept of 
gross national product is now given central importance (at the expense of 
other general business indicators), and a method of projecting its compo- 
nents is offered later as the core of a five-step “effective program for business- 
cycle forecasting” (pp. 437-443). An actual case illustration, however, is 
needed to clarify this very promising method. 

Another “one of the distinctive features of this third edition” which “adds 
leverage to the analysis” (p. v), is the treatment of secondary trends sepa- 
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rately from secular trends. The concept of the secondary trend (which is re- 
lated to the long cycles or intermediate trends described so variously by 
Wardwell, Kondratieff, Juglar, Kitchin, Silberling and others), however, 
remains a shadowy one, as it does in Burns and Mitchell’s Measuring Busi- 
ness Cycles (New York: National Bureau of Economic Research, 1946, 
Chap 11). The results are therefore inconclusive. Bratt properly points 
out that “regularity of recurrence of the secondary trend [is] a completely 
undemonstrated conclusion” (p. 71)—thereby avoiding a basic fallacy of 
Dewey and Dakin (Cycles. New York: Henry Holt and Co., 1947)—so that 
“no consideration can be given to the forecast of the secondary trend” (p. 
77). Later, though, he seems unduly pessimistic in concluding that therefore 
secular trends may not be projected (p. 77). 

The treatment of business forecasting (Chap. 18) is considerably condensed 
from that of the revised edition (Chaps. 21-22). One misses a detailed dis- 
cussion of leads and lags, specific historical analogy and several other tradi- 
tional forecasting procedures that may still have some validity today. 
Furthermore, the pessimistic statement (p. 420) that “we have practica!ly 
no basis whatsoever for forecasting originating causes” (such as acts of 
government and wars) seems to be countered by the partial success of such 
Washington forecasters as Cherne and Kiplinger and by Bratt’s own section 
on “Measurable Effects of Originating Causes” (pp. 401-402). 

Other parts of the book are revised less radically. Seasonality, secular 
trends (including growth curves) and several methods of analyzing time 
series are presented in readable, nontechnical fashion. A preliminary chapter 
on “Concepts of Balance” in the revised edition has properly been omitted. 

The detailed treatment of factors responsible for the cyclical nature of 
business (Chaps. 5-6), remains “the central part of the analysis” (rev. ed., 
p. vi), and provides a valuable analytical description of the course of a 
typical cycle. Other sections of particular interest are those on the distinc- 
tion between difference and summation series (pp. 90-92) and on measures 
of business confidence (pp. 416-418). 

The eclectic survey of business cycle theories (Chaps. 7-9) has been con- 
densed and simplified since the 1940 edition, but is basically unchanged. 
The treatment here is perhaps not quite as lucid as in Estey’s Business 
Cycles (New York: Prencice-Hall, Inc. 1941) but is more comprehensive. 

The history of business cycles has been brought up to date through the 
end of 1946 (Chap. 14). The concluding section on parallels between World 
Wars I and II (pp. 350-352) is a provocative one which would justify more 
penetrating and detailed treatment. 

The final section on proposals for stabilizing business cycles, has been ex- 
panded in line with the growing importance of this problem. The recent 
work of the President’s Council of Economic Advisers is included. 

The chief virtue of the book as a whole is its broad, impartial survey of 
all the main aspects of business cycles, reflecting the author’s accumulated 
experience in preparing three editions of this work. This reviewer fully sub- 
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scribes to Bratt’s basic policy of separating time series into secular, seasonal 
and cyclical-random elements, and his use of the analytical approach to 
business cycles developed by Wesley C. Mitchell in his 1913 and 1927 clas- 
sics. The third edition is thoroughly modernized and is well organized. The 
faults are minor. While it is sometimes obscure or superficial, it provides 
many references to primary sources as guides for more intensive study. The 
condensation in general is an improvement, though the printer’s crowding 
of more words per page makes for slightly more difficult reading than before. 
This book is recommended both as a text and as a general reference work. 


Theory of Experimental Inference. C. West Churchman (Associate Professor of 
Philosophy, Wayne University, Detroit 1, Mich.). New York 11: Macmillan 
Co. (60 Fifth Ave.), 1948. Pp. xi, 292. $4.25. (London W.C. 2: Macmillan & Co., 
Ltd. [10 St. Martin’s St., Leicester Sq.] 21s.) 


REVIEW BY JoHN W. TUKEY 
Assistant Professor of Mathematics, Princeton University, Princeton, N. J. 
Member of Technical Staff, Bell Telephone Laboratories, Murray Hill, N. J. 


HE author has tried to write a challenging book—and has succeeded. By 

. mixing modern statistical inference and classical philosophy he has 
written a book which could serve to introduce statisticians to philosophy 
and philosophers to modern statistical inference. The book is a far from 
perfect tool for either job, but it will, in this reviewer’s opinion, have sub- 
stantial influence both on philosophers and on statisticians. It is, however, 
the meaning of the book to statisticians interested in the foundations of 
their subject which ts the chief topic of this review. 

The first three chapters are devoted to a discussion from the point of view 
of formal science and philosophy of modern statistical inference, as exam- 
plified by the Neyman-Pearson theory, and a brief discussion of its relations 
with scientific method. The next chapter outlines a formal classification of 
systems of philosophy, according to their views on knowledge, into rational- 
ism, naive empiricism, statistical empiricism, criticism, relativism and, 
finally, experimentalism. The first five schools are each the subject of a 
chapter, while experimentalism, founded by E. A. Singer and supported by 
the author, is discussed in four chapters. The book concludes with three 
chapters relating inference with social groups, social purposes, and a pro- 
posed science of ethics, 

The most striking point to the statistician who is concerned with the 
foundation of his subject and who believes that the millennium is still far 
away is the complete acceptance by the author of a definite methodology of 
statistical inference as the methodology. The reviewer feels that the present 
methodology of statistical inference has been significantly biassed by desires 
for (i) analytic manageability, (ii) mathematical simplicity, and (iii) un- 
warranted uniqueness. The philosopher and the statistician now need to col- 
laborate in working toward the ideal basis of statistical methodology. 

Another striking point is the insistence of the author on security. He ad- 
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mits that complete freedom from risk is impossible in practice, but he in- 
sists that the possibility of reducing the risk to an arbitrary small value is 
essential. Again: “And he must be sure about these things, or else he would 
find it impossible to act efficiently; he cannot even entertain the notion that 
there are risks involved in his decisions, for if such doubts creep in, he finds 
it impossible to act quickly and efficiently” (p. 236). This sentence is sup- 
posed to refer to every day decisions, but it expresses the authors’ general 
philosophy. 

Finally: “. . . we must have criteria of the most efficient methods of solving 
problems before we can give responses to any questions” (p. 243). This 
statement is the more surprising when we recall that a “response” to the 
author is only an approximate step toward an answer. 

The author’s discussion of presuppositions and their importance in sta- 
tistical inference (p. 12) should be read by all statisticians interested in 
methodology. 

The author holds that: “In a sense, the problem of the best ‘design’ of an 
experiment is exactly the problem of the philosophy of science. . .” (p. 21). 
This is closely related to his quotation from R. A. Fisher: “The more thor- 
ough the design of the experiment, the more meaningful is the question 
asked” (p. 208). The author and the reviewer are in agreement as to the 
validity and importance of these quotations, but we draw differing conclu- 
sions. The author concludes that design of experiment transcends statistics, 
while the reviewer concludes that the philosophy of science is a part of 
statistics, since he defines statistics as: “The science, the art, the philosophy, 
and the technique of drawing conclusions from the particular to the general.” 
Leaving this difference aside, it follows that any adequate account of the 
design of experiment must include serious attention to the philosophy of 
science. The author is led to the following strong statement: “But there 
should be realization in statisticians’ minds that they have pushed their 
basic problem beyond the field of formal statistics when they attempt to set 
down the criteria of best test. The danger of not realizing this point lies in 
the possible action that will result when a formally defined criterion of best 
is taken to satisfy nonformal demands of the science of value” (p. 283). 

The author’s solution of the philosophical problem posed by randomness 
is: “We would not be able to find randomness in our observations had we not 
first put it there in some form” (p. 124). This is consistent with his desire for 
security and his faith in a future physics without indeterminacy (pp. 77 and 
231-233). 

Since the author’s philosophy does not provide explicitly for the criticism 
of statistical presuppositions, it is not particularly surprising that, on page 
284, he holds that tolerance limits can be set with equal validity from small 
as from large samples. For this would be correct if the usual presuppositions 
were correct. 

At a more philosophical level, the author concludes that science, in any 
sense, can only exist when nature is regular—“That is, the meaning of an 
observation presupposes a principle of regularity in nature” (p. 128). This 
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position must, it seems to the reviewer, be accepted. But the author goes on 
to insist that:—“The reason that the relative frequencies must approach 
some limiting value is that the question of probability is otherwise meaning- 
less; one is ‘guaranteed’ that they do by the natural image which is presup- 
posed in all experimental problems” (p. 203)—and to insist that: “The 
fundamental postulate of experimentalism, therefore, is the following: There 
exists a formalization of nature, such that stochastic limits exist for certain se- 
quences of mathematical functions of the observations which are pertinent to a 
given question of fact” (p. 178). This seems to the reviewer an overstrong and 
unwise requirement of security. The pertinence of the observations is, ac- 
cording to the author, to be settled by “formal” methods: “. . . the justifi- 
cation for assuming that a certain set of actions produces pertinent observa- 
tions depends upon theoretical (formal) considerations on the part of the 
experimenter. These considerations must be presupposed by him in conduct- 
ing his experiments. The more aware he is of the nature of these presup- 
positions, the more exact is his experimental method” (p. 271). In the same 
vein, the author holds that “every statistical hypothesis should be a consequence 
of a formal theory of nature” (p. 218). The direction of this proposition is un- 
doubtedly good, but it goes much farther than the reviewer would care to go. 

In his approach to control, the author emphasizes the stochastic limit 
again: “... an experiment is said to be controlled if we state all the formal 
conditions under which a mathematical function of a series of observations ap- 
proaches a limit stochastically” (p. 182). This strong definition is then used 
in: “No question of fact can be said to have meaning unless there exists a con- 
trolled experiment for its answering” (p. 183). The reviewer feels that this is 
a roundabout way to say that no question of fact has a meaning. 

In discussing the adequacy of formal probability theories, the author 
seems to confuse “determination in theory” and “ determination in practice.” 
He demands: “Let O(x:, x2, ..., Xn) be any random sample, with known 
elementary probability law; let t be any statistic of the sample with degrees 
of freedom at least 1; then the theory should bee able to state the elementary 
probability law of t” (p. 19), and then he asserts (pp. 19 and 30) that this 
demand has not been met in the present probability theory. While the math- 
ematical statistician may not be able to provide a compact and usable answer 
to many problems of distribution, he can provide a systematic and finite 
process for determining the distribution within any preassigned limits. In 
the author’s sense of “answer” it seems to the reviewer that modern proba- 
bility theory provides “answers” to all problems of distribution involving a 
finite number of observations. 

In the author’s discussion of the philosophy of science, this reviewer was 
struck by the statements that (i) “We may find it methodologically profit- 
able to keep contradictory tenets within science” (p. 192); (ii) “There is no 
true beginning-point to science” (pp. 209-210); (iii) “The time has come to 
recognize the circularity, or spiral form, of science, and the complete inter- 
dependence of the sciences” (p. 216); (iv) “Hence, science demands a science 
of efficiency, and cannot establish such a theory within psychology or the 
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science of social groups. The science of ethics, for such we call the measure 
of loss, must on the one hand belong to experimental science, and yet not 
be an aspect of any of the special disciplines now recognized” (p. 250). With 
the first three of these the reviewer is in hearty accord, on the fourth he feels 
uninformed. 

The real difference between the author and the reviewer is in their ap- 
proach to models, whether mathematical or formal. The author is prepared 
to take a model on its face value, apparently without consideration of its 
weak points. It does not seem to the reviewer that this is how science has 
made its great gains by the use of models. It is by combining a working 
model with more a detailed, and probably unmanageable, model which in- 
dicates the soft spots of the working model that science has progressed. 

Passing now from matters of opinion to matters of fact, there are a few 
specific points. On page 7, the author states that, when the null hypothesis 
holds and sample size increases to infinity, “t will have a limiting value of 
zero.” This is incorrect. On page 35, the author states that the problem of 
confidence intervals for means of later samples is “the so-called problem of 
Tolerance Intervals.” This is a slip. On page 257, the author suggests that 
the best test is obtained by minimizing the integral of the risk over the param- 
eter space. This is not invariant, and of doubtful utility. A similar difficulty 
occurs at the top of page 211. The wording of the next to the last paragraph 
on page 9 and the two-valued use of n on page 16 seems sloppy. On page 12, 
the author asserts that “we could—find a best test” for slippage with an ar- 
bitrary continuous distribution. The reviewer would be interested in the 
definition of “best” and the resulting test. 

The book is singularly and pleasantly free from typographical errors—the 
only ones noted were “procedure” on page 207 and “m:/m:2” for “m2/m,” 
on page 211—and is excellently printed and bound. 

The reviewer would not have taken so much space to review a book he 
judged of little use. Although he disagrees with the author on almost all the 
really basic points, he plans to use the book in connection with a course in 
the design of experiment this fall. 


Quality Control: A Manual of Quality Control Procedure Based Upon Scientific 
Principles and Simplified for Practical Application in Various Types of Manu- 
facturing Plants. Norbert L. Enrick (Associate Professor of Management, South- 
western Louisiana Institute, Lafayette, La.). New York 13: Industrial Press 
(148 Lafayette St.), 1948. Pp. vi, 122. $3.00. (Brighton 1, England: Machinery 
Publishing Co., Ltd. [148 Lafayette St.].) T’wo reviews follow: 


Review By J. H. Curtiss 
Chief, National Applied Mathematics Laboratories 
National Bureau of Standards, Washington, D. C. 


—_ to the Introduction, this book is intended for practical men 
in inspection who do not want to be bothered with “higher mathe- 
matics,” but who would like to have statistical quality control explained in 
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simple terms. “Higher mathematics” here means anything beyond grade 
school arithmetic. The spirit of the book is perhaps best conveyed by re- 
porting the fact that the presentation of control charts for averages is so 
arranged that in setting up the control limits, the mean range of a set of 
samples never has to be multiplied by any factor more complicated than 
unity! 

Thus the author has imposed rather severe conditions of limited visibility 
on the flight of his muse. The result is a sort of minimum cook-book of 
statistical quality control recipes, supplemented by some practical advice 
on the management aspects, and by some rather sketchy and disjointed re- 
marks on tolerances and gages. An elementary discussion of the underlying 
theory is also given in a few pages at the end of the book. 

The statistical quality control recipes occupy the first 45 pages or so of 
the book, with a little additional statistical material (mainly on “compressed 
limit gaging” and statistical study of tolerances) scattered through later 
chapters. The two main statistical techniques discussed are lot-by-lot in- 
spection, using sampling by attributes, and control charts for averages and 
ranges. There is no treatment of charts for numbers of defects and for pro- 
portion defective. ; 

At the outset of the chapter on lot-by-lot inspection, the author promises 
to demonstrate later that one should not use lot-by-lot inspection on in- 
spection lots containing less than 300 items, but the reviewer was unable 
to find the demonstration of this theorem in the ensuing text. Sampling 
tables are given in the form of double entry tabulation, one argument being 
lot size range, and the other “allowable per cent defective.” There are two 
tables for discrete items, one of them containing sequential sampling plans, 
and the other single sampling plans with operating characteristics similar to 
the sequential ones. These tables are supplemented by two roughly parallel 
ones for use on continuous products. Although no credit is specifically given 
in the text, the sequential sampling tables are copied from the “Inspection 
Handbook on Sampling for Quality Control,” QMC-M605-15, published by 
the Office of the Quartermaster General in 1945. Presumably the other tables 
are taken from material developed for later editions of the QMC Handbook. 

The concept of “allowable per cent defective” used in this book seems to 
be a sort of mixture of Average Outgoing Quality Limit (AOQL) and Ac- 
ceptable Quality Level (AQL) as these terms are now used in the technical 
literature. Mathematically, the “allowable per cent defective” ascribed to 
each plan is approximately equal to its AOQL, a fact implied by the brief 
elementary discussion of the theory of the plans given at the end of the book. 
The instructions and examples, however, seem to handle the “allowable per 
cent defective” as if it were an AQL. 

The control chart for averages is set up as a test of the compound hy- 
pothesis that the population mean y lies in the range 7; +3.1¢ Sp ST; —3.1¢, 
where 7’, and T; are preassigned lower and upper tolerances for individuals 
and a is the population standard deviation. The test is carried out with the 
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arithmetic mean of a sample of size 3 to 5, using a level of significance cor- 
responding to a 2 a tail of the distribution of the mean. That is: the control 
limits are given by 7,+(3.1—2/+/n)o and T; —(3.1 —2/+/n)o. If 3sn 5, 
the theoretical mean value of the sample range (assuming normality) is 
roughly equal to (3.1 —2/4/n)c, so the very simple practical rule for finding 
the control limits mentioned in the first paragraph of this review is obtained. 

This type of control chart of course differs from the standard Shewhart 
control chart for averages from the viewpoint of both engineering and math- 
ematical theory. An obvious minor disadvantage (which, however, may be 
a grave one for the intended users of this manual) is that a pair of tolerance 
limits must be given before the recipe can be carried out. A major disad- 
vantage is that a fundamental Shewhart control chart doctrine is ignored: 
a principal goal in quality control is the achievement of a state of statistical 
control about stable population values of u and go. In the present type of 
chart, w is theoretically permitted to wander about at will between the 
limits given above. But the book is written primarily for the line inspector, 
and not for management, nor for quality control engineers; and perhaps a 
control chart which places first emphasis on the immediate avoidance of non- 
conforming product, as this one does, is the right one to present under the 
circumstances. The control chart for the range is given the orthodox treat- 
ment. 

In his effort to be brief and clear, the author omitted some points which 
the reviewer considers rather essential for the proper application and inter- 
pretation of even the few simple techniques here treated. No operational 
meaning is given to the words “random sampling,” which are explained 
in a circular manner by simply repeating the word “random” in a couple of 
different contexts. Rational subgroups are not adequately discussed in 
connection with control charts. In the discussion of tolerance ranges, the 
correct location of the range, as determined statistically, is ignored, and only 
the width of the range is discussed. 

As stated before, the technical material on statistical control takes up a 
little less than half of this 120 page book, and in the opinion of the reviewer, 
it could have been presented in an unhurried pamphlet of about 25 pages. 
The density of thought in the chapters on tolerances and gaging, and in 
other later chapters, seemed rather low, and the reviewer wondered how 
much of that material would be new or useful to an experienced iuspector. 
The book certainly does not begin to cover adequately the non-statistical 
aspects of quality control, and indeed the author would probably disavow 
any intentions in this direction. 

On the other hand, the style is simple and clear; the many examples are 
well-chosen and informative; and all-in-all, this is a very readable little 
treatise. It must be left to members of the intended audience, rather than 
to this reviewer, to judge whether the extra pages were well worth the time 
and effort. The judgment may very well be in the affirmative. 
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Review By E. H. MacNiEce 
Director of Quality Control, Johnson & Johnson 
New Brunswick, New Jersey 


7 book effectively leads the reader into a simplified quality control 
program. The program is clearly outlined and stated in such a manner 
that the shopman is familiarized with the use of the method without the 
complicated terminology and mathematics so frequently found in books on 
this subject. Perhaps its greatest service will be in the conditioning of non- 
technical shop personnel for the acceptance of quality control as a means of 
achieving productivity in terms of acceptable quality with low waste rather 
than high production with too much of it finding its way to the scrap pile 
or the salvage department. Mr. Enrick’s book is highly recommended as 
primary reading for men in industry who want to produce acceptable eco- 
nomic quality. 


Traffic Performance at Urban Street Intersections. Bruce D. Greenshields, 
Donald Schapiro, and Elroy L. Ericksen. Yale Bureau of Highway Traffic, 
Technical Report No. 1. New Haven, Conn.: Bureau of Highway Traffic, Yale 
University, 1947. Pp. xv, 152. Gratis. 


REVIEW BY Harry G. Romie 
Member of Technical Staff, Bell Telephone Laboratories, Inc. 
463 West St., New York 14, N.Y. | 


HIS report presents a practical as well as a statistical analysis of traffic 

data covering “the intersections of streets at grade in urban areas.” 
Its importance is readily realized since “one-half of all urban traffic accidents 
and more than three-fourths of all delays experienced in dense urban areas 
are related to intersections.” The traffic engineer will find much new valuable 
material in this report, and should have it handy as each page presents im- 
portant details that require careful study. 

The manner of presentation is excellent. The table of contents and the 
complete index at the end are in sufficient detail to make them satisfactory 
for ready reference. As much of the descriptive matter centers around the 
figures and tables, the authors provide a fine descriptive list of each with 
accompanying page references. The book consists of six chapters and six 
appendixes. Chapter 1 presents the techniques used in collecting the field 
data in permanent form for analysis. Photographic devices used are de- 
scribed in sufficient detail that others may follow the same procedures in 
making other similar surveys. Chapter 2 describes “Starting Performance 
at Signalized Intersections.” Practical and theoretical solutions are pre- 
sented. Chapter 3 covers “Deceleration of Motor Vehicles at Street Intersec- 
tions.” The findings of the study are given in a simple but forceful manner. 
Chapter 4 presents the “Behavior Patterns at Unsignalized Intersections.” 
The Methods of Analysis are described together with the Detailed Analysis 
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of Specific Aspects of Behavior. Chapter 5 considers “Highway Traffic and 
the Theory of Probability.” The nature of the distributions found is dis- 
cussed and it is shown that the Poisson series may be used effectively in the 
analysis as the Poisson distribution appears to fit the data. There are two 
distinct parts to this chapter, one describing the General Theory and a second 
covering the Theory of Random Distribution applied to Signalized Inter- 
sections. Chapter 6 describes “Typical Traffic Problems” and indicates their 
solutions. An excellent summary of each chapter is given at its close in all 
cases but the First and Fifth. Chapter 1 has no summary, while Chapter 
5 has a summary for the general theory and also a summary covering the 
case for random distributions. Also six valuable appendixes have been pro- 
vided dealing in mathematical relations, tables and important theories that 
have been expanded in detail to supplement the main report. 

Throughout the study, in taking pictures or making graphs, frame time 
intervals of 1/88 of a minute were used. This makes it possible to express 
velocity directly in miles per hour if measurements of distance between time 
intervals are expressed in feet, i.e., an automobile traveling 5 feet in one such 
time interval has a velocity of 5 mi./hr. Pictures were taken at sufficiently 
high elevations to provide a view of the intersections studied and timing 
devices were included to permit ready identification of the different frames. 
Later, it was possible to study each frame individually or a run of frames to 
properly analyze the different conditions under study. In addition to the 
splendid charts provided in this report, 9 photographs are included showing 
the intersections involved, and the projector and mounting. 

In studying starting performance at signalized intersections, attention was 
given to three factors: (1) time required for vehicles to commence motion, 
(2) distances reached by vehicles in given time intervals after starting, and 
(3) spacing between vehicles. Small trucks are treated the same as passenger 
cars, but buses and large trucks are studied separately. Where no signal 
occurs at an intersection or one street has a “Stop” sign, collision points were 
selected in the middle of the intersections. Velocity, delay factors, reactions 
of different drivers, and other factors were considered. Time value to the 
collision point was found to be the main criterion by which drivers decide 
to take precedence over vehicles approaching on the cross street. 

The report added much to its value by including the range of variation 
indicated by the data for the various situations covered. Its last two chapters 
discuss the application of probability theory to the problem considered and 
indicate the use of the Poisson exponential distribution. The authors indicate 
that in 1934 “the theory that traffic follows a random distribution was as- 
sumed by Mr. John P. Kinzer in an article in which he calculated the prob- 
ability of any car picked at random going a mile without interference or 
delay on a two lane road with a given volume of traffic.” The authors develop 
the theory and show that with the exception of one second spacings the 
Poisson theory fits their data fairly well. The exception is due to the desire 
of drivers to avoid rear end collisions. It is possible to apply the theory and 
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obtain reasonable solutions to many traffic problems, which formerly defied 
solution. 

Many numerical examples are given and also relations covering the solu- 
tion of different varieties of traffic problems. Approximate relations are given 
for use in solving problems when the work of computation becomes too diffi- 
cult for obtaining the exact theoretical solution. Chapter 5 covers the 
theoretical treatment and Chapter 6 presents solutions to a number of 
typical traffic problems. To readers other than traffic engineers it would have 
been helpful to have presented in an introduction, or in Chapter 1, the typi- 
cal traffic problems that are to be solved. Even after reading the report 
several times, it is not clear how the timing of red and green signals are ob- 
tained for the most efficient movement of traffic. When should there be a 
flashing red? a flashing amber? a policeman in control? a Stop signal at only 
one intersection? a Stop signal at two intersections? Chapter 6 is supposed to 
provide answers to some of these questions. Those preparing the report 
were doubtless more interested in the analysis of their results than in delin- 
eating how these results can be applied. The last example on signal timing is 
excellent. More applications of this nature should be included. Other reports 
can be made more valuable by spending a little more time at the beginning 
and end in showing how to use the findings. 


Mathematics of Sampling. Walter A. Hendricks (Principal Agricultural Statis- 

tician, Bureau of Agricultural Economics, Washington 25, D. C.). A summary 

of a course of lectures given during the 1947 Statistical Summer Session at Vir- 

ginia Polytechnic Institute. Virginia Agricultural Experiment Station, Special 

— Bulletin. Blacksburg 13, Va.: the Station, February 1948. Pp. 1i, 45. 
ratis. 


Review By T. A. BANCROFT 
Research Professor of Statistics and Director of the Statistical Laboratory 
Alabama Polytechnic Institute, Auburn, Alabama 


LTHOUGH the material for the most part is not new, having been taught 
A in survey sampling courses at various statistical centers, in particular at 
Iowa State College, it presents in published form an introduction to the 
mathematics of survey sampling. It should be a welcomed addition to the 
unfortunately small amount of published material available in this rapidly 
expanding field of statistics. The booklet should be of value as a reference for 
workers engaged in survey sampling as well as for teachers and students of 
its theory and practice. 

The mathematical aspects stressed are those that are basic to an under- 
standing of the sampling designs and analyses used in actual sampling sur- 
veys conducted at the present time and for the most part by various federal 
agencies and certain universities with strong statistical sections. Since the 
booklet is concerned with the mathematics of sampling, no attempt is made 
to discuss techniques of planning, schedule or questionnaire construction, 
organization, field operations, etc. A good idea of the type of material dis- 
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cussed can be obtained from the following list of headings: Classical Error 
Theory, Random Sampling in Practice, Analysis of Variance and the Esti- 
mation of Variance Components, Stratified Sampling, Subsampling, Cluster 
Sampling, Binomial and Multinomial Sampling Variation, The Problem of 
Nonresponse, Linear Regression, and The Method of Least Squares. A 
selected but valuable list of references is given in a section on suggested 
reading. In the reviewer’s opinion the value of the booklet would have been 
greatly enhanced by the addition of sections on: choice of sampling unit, 
determination of sample size, confidence intervals, double sampling, and 
variances of totals based on various methods of estimation. 

Although the title of the booklet contains the word “Mathematics,” no 
attempt has been made to give either rigorous detailed mathematical proofs 
or to introduce useful powerful mathematical concepts or machinery to 
shorten such proofs. Instead the heuristic approach has been used, the details 
of proofs in many cases being suggested rather than explicitly stated. General 
theorems, probability distributions, and formulas have been advanced as 
true because of their analogy with simpler cases. The manner of presentation 
is understandable since the booklet is a summary of a few lectures covering a 
broad field. It seems to the reviewer that a valiant effort has been made, by 
the use of these methods, to make the methodology of survey sampling rea- 
sonable to workers engaged in this field who may have a modest background 
in the elements of mathematical statistics. If such be the case, it seems to the 
reviewer that on the whole the author has been successful with one important 
exception. It is the opinion of the reviewer that the fundamental assumptions 
and limitations involved in setting up various mathematical models, espe- 
cially in the case of the analysis of variance and of such proofs as indicated in 
formula (52) and the simpler case at the bottom of page 11, should be given. 
It is true that an indication of the fundamental assumption of linearity in 
the analysis of variance model is given in equation (50), but in the reviewer’s 
opinion a greater understanding would have resulted from beginning with 
the usual assumptions, i.e., «;; =u +f; +e;;, etc., and with detailed definitions, 
even though the derivation in the latter case is longer. 

No mention seems to have been made of the formulas for the variance of 
a product and the variance of a quotient. For the sake of comparison, it 
would seem desirable to give a proof of the usual formula found in the 
literature for the variance of the mean of a sample from a finite population 
in addition to the one given in the booklet. 

There are several typographical! errors. Also the reviewer differs with the 
author on several points of notation. On page 6 in (28) and again on page 
15, r has been used in place of p. On page 12, Table 2, s? should be added to 
Ko,? for the mean square between classes. On page 9, in (41), it would seem 
more appropriate to replace s* by (s’)? since s* is defined by (38). Page 13, 
(49) and (50) should have ky and &; respectively on the left sides. In solving 
for various variance components, greater clarity should result in replacing 
a.” and o* in the equations of estimation, by 63 and 6". 
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Principles of Medical Statistics, Fourth Edition. A. Bradford Hill (Professor 
of Medical Statistics and Director of the Department, London School of Hygiene 
and Tropical Medicine, University of London, London, England). London 
W.C.2: Lancet Ltd. (7 Adams St., Adelphi), 1948. Pp. xi, 252. 10s. 6d. 


REVIEW BY MARGARET MARTIN 
Assistant Professor of Preventive Medicine and Public Health 
Vanderbilt University, Nashville, Tennessee 


Ews that A. Bradford Hill’s excellent book on medical statistics is again 
N available, in the form of an enlarged fourth edition, is indeed welcome. 
The principal changes from earlier editions are the addition of a new chapter 
on averages, a section on the normal curve, and the expansion of the chapters 
on frequency distributions and graphs, chi square, life tables, and standard- 
ized death rates. 

The clarity of the presentation, the emphasis on the meaning and inter- 
pretation of statistical results, and the inclusion of numerous examples 
illustrating the dangers of careless statistical thinking account for the popu- 
larity which this work has enjoyed since its first appearance as a series of 
articles in The Lancet in 1937. Medical students, physicians, and other work- 
ers in the medical fields who wish to gain an understanding of the principles 
of elementary statistics will find it most helpful and stimulating. 

On the whole the selection of material to be included in this elementary 
text of nonmathematical character is excellent. The reviewer feels that it 
would be desirable to have included a table of probabilities for the normal 
curve to be used in significance tests, especially since such a table is given for 
chi-square; that in the discussion of significance tests for proportions, more 
detailed consideration might have been given to the conditions necessary 
for reasonably reliable application of normal curve theory and to the correc- 
tion for continuity; and finally, that follow-up studies in which cases are 
under observation for fractional parts as well as for whole numbers of years 
might have received mcre complete treatment. On the other hand, the calcu- 
lation of the average length of after-life in a study in which life experience is 
not complete for all patients (p. 173) does not seem to be particularly useful 
and might, in fact, lead to misinterpretation. 

A few minor errors have been noted. In the calculation of the median of 
grouped frequency distributions (pp. 49-51), the point below which there 
are (N +1)/2 instead of N/2 observations, assuming that the observations 
are evenly distributed over the interval in which the median falls, is obtained. 
In the diagram on page 65 the intervals labeled one, two, and three standard 
deviations, respectively, are actually twice this amount. The appearance 
of a “minus sign” in line 6 of page 74, when algebraic signs have been ignored 
in corresponding situations in earlier examples (i.e. in a correction term 
which is to be squared), might cause some confusion to the reader. In the defi- 
nition of the weighted mean on page 245, some necessary parentheses have 
been omitted in the numerical example. In the definition of the chi-square 
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test on page 246, the word “frequencies” would seem to be more appropriate 
than the word “values.” 


An Experimental Introduction to the Theory of Probability. J. E. Kerrich 
(Senior Lecturer in Mathematics, University of the Witwatersrand, Johannes- 
burg, South Africa). Copenhagen, Denmark: Einar Munksgaard, 1946. Pp. 98. 
Paper. 


REVIEW BY J. F. KENNEY 
Associate Professor of Mathematics, University of Wisconsin 
623 West State St., Milwaukee 3, Wisconsin 


NDOUBTEDLY many teachers have had experiences similar to those of the 
U author in presenting lectures on elementary statistics to mixed classes 
of students and colleagues who vary widely in mathematical preparation 
and whose interests lie in diverse branches of science. ‘‘It is a most interesting 
problem,” the author remarks, “to design lectures suitable for such a class.” 
An opportunity to design suitable material on one topic came to him when 
he found himself interned (for his own safety, as a British subject) by the 
Danish government during the recent war. Thus he had the leisure and pa- 
tience to conduct the simple but extensive experiments on random events 
that comprise the subject of this book. The main experiments consisted of 
spinning a coin 10,000 times and drawing 5,900 times two ping-pong balls 
out of four of which two bore a red trade-mark and two a green trade-mark. 
(The drawings were made by a fellow-internee “at a rate of about 400 times 
an hour with—need it be stated—periods of rest between successive hours.”) 
Also an experiment equivalent to tossing a biased coin was performed with 
a small wooden disc coated on one face with lead. 

Various results from these experiments are recorded in tabular and graphi- 
cal form. Data are analyzed both in the large and in sub-sequences with re- 
spect to various ratios such as #/n where m denote number of heads and n 
number of spins, and m2/(m: +m) where mz denotes the number of times that 
green was second in the m+, experiments in which red appeared first. 
The analysis leads to a body of ideas, namely, a mathematical theory which 
describes the observations. Thus the author arrives at the tools of pure mathe- 
matics. Using appropriate symbols he discusses complementary, joint, 
mutually exclusive, compound, and conditional events, the addition and mul- 
tiplication principles, and the binomial distribution. The normal distribution 
is mentioned briefly and an introduction is given to the notions of estima- 
tion and confidence intervals. 

In the reviewer’s opinion the author has admirably achieved his objective 
as stated in the Foreword: “In this book, a little ground is covered thor- 
oughly and great pains are taken to try to present a clear picture of the physi- 
cal significance of a mathematical probability. With this background the 
student will be better equipped to study the many texts which deal with 
‘pure’ theory based on a system of axioms.” And his hope is well founded 
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when he says: “It is hoped that students of these pages will never have to 
reject any of the ideas given here, no matter how much they may refine them 
as their knowledge of the subject grows.” 


Statistical Methods in Medical Research: I, Qualitative Statistics (Enumera- 
tion Data). Donald Mainland (Professor of Anatomy, Dalhousie University, Hali- 
fax, Nova Scotia). Reprinted from the Canadian Journal of Research, Section E, 
Medical Sciences 26 (1): 1-181 February 1948. Ottawa, Canada: National Re- 
search Council of Canada, 1948. Pp. 181. Paper. Apply. Two reviews follow: 


REVIEW BY JoHN W. FERTIG 
Professor of Biostatistics, School of Public Health of the Faculty of Medicine 
Columbia University, New York 32, N. Y. 


His article is essentially an expansion of Chapters 2 and 3 of the author’s 

The Treatment of Clinical and Laboratory Data (Edinburgh, Scotland: 
Oliver and Boyd Ltd., 1938). A detailed consideration is given to small sam- 
ples of enumeration data in which the normal curve or chi square solutions are 
not completely satisfactory. Fifty-four pages of tables are presented giving 
confidence limits for a two-fold classification of enumeration data and prob- 
abilities or significant differences for four-fold contingency tables. There is 
also a table of chi square and one of four place logarithms of factorials. The 
text covers 103 pages and is divided into an introductory section of 11 pages, 
one of examples covering 54 pages, and one of explanatory semi-theoretical 
notes covering 36 pages. Most of the examples are concerned with the com- 
parison between a sample and a population relative frequency and with the 
four-fold table, including numerous variations of these problems. The prob- 
lem of non-dichotomous scales is only briefly considered, as is the problem of 
combining information from two or more samples. 

The suggested treatment of the numerous examples consists largely in 
aiding the reader utilize the tables contained in the article. Practically no 
treatment of the rationale of the method is given at the time of the discussion 
of the example. This is reserved for the section on notes. Each example is, 
however, followed by a series of helpful comments. While this reviewer recog- 
nizes that the investigators for whom this presentation is intended are often 
not very patient with a discussion of the reason for a certain statistical 
method, he still feels that the incorporation of the section on notes together 
with the examples would have produced a much better appreciation of the 
techniques. 

It is sometimes difficult to appreciate the reason for the author’s pref- 
erence for chi square, for example, on page 39: “Some investigators still use 
the standard deviation or standard error,s/Npq, instead of chi square, for 
comparison of the sample. This is not to be recommended.” It is not pointed 
out clearly that the correction for continuity can be used for the normal 
curve as well as for chi square. The author recommends the summation of 
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chi square values for combining information from several samples, but this 
method may at times be unduly conservative. 

The author has to some extent achieved his goal of classifying certain 
types of problems relating to enumeration data, and of telling the investi- 
gator how to find his problem and a suitable answer for it. The tables sup- 
plied are indeed very comprehensive and useful. However, it seems to this 
reviewer that the approach is too mechanistic and would be unsatisfactory 
for many investigators. 


Review BY A. Braprorp HILu 
Professor of Medical Statistics and Director of the Department 
London School of Hygiene and Tropical Medicine 
University of London, London W.C.1, England 


R. MAINLAND says that his article was devised “to meet the wishes of 

those who, in the words of one investigator, would say: ‘I have a problem 
on hand... Must I spend a month of free evenings reading a book from 
end to end several times and mastering all details before deciding how to go 
about solving the problem? I hope not.’” He might have retorted that work- 
ers in the medical sciences would not expect to use, say, bacteriological or 
pathological techniques without mastering the details and is there any rea- 
son why statistical processes should be regarded differently? There is, per- 
haps, at least an excuse. Few workers unless trained in such subjects as 
bacteriology or pathology will be so bold as to embark upon them; almost 
all, whatever their subject matter, will sooner or later be faced with statis- 
tical data and have to interpret them. Often too, particularly in clinical 
medicine, their numbers of observations will be small. It is, therefore, legiti- 
mate to argue that it is better to give the worker easy access to tests of sig- 
nificance which he may imperfectly understand rather than to let him rely 
solely upon that “common-sense” which is, in fact, so uncommon. 

The serious danger of this procedure, which Dr. Mainland recognizes, is 
that the worker may come to regard the mathematical tests as the most 
important part of the statistical methodology and forget that of much 
greater importance “are, first, the planning of the experiment or observation 
so that valid inferences shall be obtainable, and, secondly, the interpretation 
of the results of the mathematical tests.” In the experience of the reviewer 
the latter is the greater risk, that too frequently today there is a tendency 
to regard “non-significant” as implying guiltless rather than non-proven, 
“significant” as proven and therefore due to a particular causal factor. Dr. 
Mainland has certainly endeavoured throughout his article to guard against 
these very undesirable by-products of his plan cf presentation. 

This plan is as follows. In an introductory section he discusses, briefly, 
some general principles and definitions—random sampling, probability, 
confidence limits, the comparison of samples, levels of significance, etc. He 
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then passes to what is the crux of the article for the investigator quoted at 
the beginning of this review, namely the working out of 40 examples of medi- 
cal problems classified so that the worker can choose data and problems com- 
parable to his own and then easily carry out the demonstrated probability 
calculation. As the article is confined to qualitative statistics the types of 
problem are mainly the argument from a sample to its population and 
the comparison of two or more samples (with subsidiary questions that 
flow from them). A final section of “notes” discusses the underlying prin- 
ciples and methodology in much greater detail and the article concludes 
with some extremely useful tables. These comprise binomial confidence 
limits (with graphs as well) over a wide range of size of sample and of values, 
and also exact probabilities for small-sample fourfold contingency tables— 
the probabilities for equal samples up to N equals 20 and the significant 
differences for unequal samples up to N,; equals 20 and Nz equals 19. These 
latter should clearly be of great help to many workers, as will also a table of 
the logarithms of factorials of numbers up to 1,000 for the calculation of 
exact probabilities not tabulated. For samples not covered by the tables 
precautions and rules regarding the use of chi squared have been derived 
from more than five hundred comparisons between chi squared and the 
exact method. 

A criticism that might be made is that Dr. Mainland is rather inclined to 
overstate the case for using the “exact” methods he gives—how often, in 
fact, would the observer be misled by the cruder methods if he were cautious 
in borderline cases?—and to place considerably too much confidence in the 
results given by very small samples. While agreeing with him that “no 
sample is too small for statistical assessment” one may yet, for instance, 
with a mere handful of sick persons to compare remember their innate vari- 
ability and Dr. Mainland’s own emphasis on the importance of “the inter- 
pretation of the results of the mathematical tests.” However that may be, 
this heavy piece of work should certainly help the medical investigator to 
apply without tears his tests of significance to small samples—though it is 
unlikely that he will do so intelligently unless he is prepared to take some 
trouble to understand what it is all about. 


Mathematical Theory of Human Relations: An Approach to a Mathematical 
Biology of Social Phenomenon. N. Rashevsky (Associate Professor of Mathe- 
matical Biophysics, University of Chicago, Chicago 37, IIl.). Mathematical 
Biophysics Monograph Series No. 2. Bloomington, Ind.: Principia Press, 1948. 
Pp. xiv, 202. $4.00. 


REVIEW BY FREDERICK MOSTELLER 
Associate Professor of Mathematical Statistics 
Department of Social Relations, Harvard University 
Cambridge 38, Massachusetts 


eae nde Mathematical Theory of Human Relations has the subtitle 
“An Approach to a Mathematical Biology of Social Phenomena.” It 
is interesting to notice that Rashevsky feels that such topics as distribution 
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of city sizes, economic interaction of the social group, variations in the class 
structure of a group, individualistic (capitalistic) and collectivistic (cooper- 
ative) societies, history of nations, theory of war, all come within the scope 
of mathematical biology. This view fits in well with the present breakdown 
of borders between sciences. 

The writings are largely publications of Rashevsky in Psychometrika 
collected in such a way as to provide continuity to the exposition. The con- 
tinuity is more one of method than of subject matter. The casual reader 
will find that the topics dodge around rather rapidly. Indeed, the book is 
something of a hodge-podge. It contains many early thoughts not very thor- 
oughly worked out but apparently put down quickly as they came to mind 
by a rather prolific but not very elegant writer. 

The principal method employed is that of differential equations, used 
somewhat in the manner of the applied physicist. One has the feeling that 
the problems were made to fit the mathematics with which the author has 
been successful in treating other problems, rather than making the mathe- 
matics suitable to the problems. Occasionally the author sidles into integral 
equations but no serious attempt is made to do anything about them. 
However, the integral equation approach did look rather promising before 
it was dropped like a hot potato. 

In reviewing such a book about human relations, one has to consider the 
scarcity of mathematical works on this subject outside the fields of economics 
and population. Naturally statistical methods are widely used in all social 
sciences, but these are usually employed for descripitve purposes rather than 
as mathematical models. Occasionally there are statistical or probability 
models which can be classified as mathematical models in the sense that they 
try to explain the way certain processes combine to produce a certain out- 
come, and these models have the property that many different aspects of 
the situation can be derived from the original set of assumptions. The work 
of Zipf has been largely the collection of certain kinds of number anomalies 
with guesses about the sociological meaning of these anomalies, while 
Stewart working on the same subject seems to be trying to build up a theory 
leading to these number anomalies from a set of assumptions. Starting from 
theory which he has developed for another purpose, Rashevsky tries to 
investigate the distribution of city sizes but is not very successful. Lewis F. 
Richardson in Generalized Foreign Politics (British Journal of Psychology 
Monograph Supplements, No. 23, 1939) attempts to study the theory of 
stability of peace between two or more nations largely by the study of the 
behavior of linear differential equations. Richardson is not quite so ambitious 
as Rashevsky. He studies conditions which will lead to war and does not 
attempt to say when war will occur, nor when one side or the other will be 
defeated, nor how the action will be carried out, while Rashevsky does make 
attempts of such a nature. The contrast between the work of Richardson 
and that of Rashevsky is worth noting because the one man takes a single 
topic and works it very extensively, while the other prefers to handle many 
topics thinly. 
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Rashevsky places a critic of his work in a very difficult position. He states, 
as we would expect any man building mathematical models to do, that none 
of the models he presents are to be taken as sacred or complete or more than 
a gross oversimplification of the techniques he has in mind. Further, even 
when he goes out of his way to get some data and compare his theory with 
some facts, he claims that no one should take the results of the comparison 
seriously as supporting the particular theory he has in mind. In every case 
as far as the reviewer can tell, he regards his examples as “only an illustra- 
tion” of what a man constructing mathematical models might hope to achieve 
and improve on if he were to make a careful extensive study of the problem. 
This attitude makes it difficult for us to know whom Rashevsky wrote the 
book for. Presumably a man familiar with mathematical models would know 
something about the kinds of things that might be achieved through the use 
of mathematics and therefore would not need all these illustrations. The 
social scientist who does not know himself how tc handle mathematical 
models will probably feel that instead of producing all these illustrations of 
what might be accomplished, Rashevsky might have done better to take one 
problem and work on it. His attitude might be that one good investigation 
would win him over. Probably Rashevsky protests more than he means and 
really feels he has a fairly general approach to many social science problems, 
and he may even feel that he has produced a good framework for building. 
In addition, Rashevsky may also feel that the reason so little work has been 
done by applied mathematicians on social science problems (outside the 
afore-mentioned fields of economics and population) is that the mathemati- 
cians see no way to attack these problems; that if encouragement of the kind 
he is offering is given, research people may see their way clear to relieving 
the scarcity of work in this field. It is very possible that this book may have 
the effect of goading researchers to work in this field, because some may feel 
that Rashevsky has stated his problems poorly and that too many problems 
are left wide open by Rashevsky’s approach. If the book produced only this 
effect, the author will have made an important contribution to the develop- 
ment of social science. 

The book opens with a “Preface and Explanatory Remarks”—section in 
which the author gives some arguments why mathematics should be allowed 
to be used in the study of social phenomena and includes a fairly lengthy 
criticism of this work by the author. Indeed, anyone wishing to criticize 
this book will be helped by reading Rashevsky’s own criticisms. Chapter 1 
considers the nature and effect of the influence of one individual on another 
and provides a definition of social class. Generalizations are achieved by 
introducing the notions of distribution of individuals in space, and in time, 
and the notion of social mobility. It is unfortunate that this important first 
chapter is not written a little more carefully; for example, on page 3 line 6 
the reader is confused between an activity and the intensity of an activity. 
On page 4 the author has not been careful about his use of absolute value 
signs or else he has changed his assumptions without informing the reader. 
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It is likely that the reader will be a little startled to see a multiple integral 
with definite limits suddenly lose two of its three integration signs with no 
caution from the author that the single integration sign is to stand for all 
three, even though it has definite limits attached which differ from those on 
the other two. 

Chapters 1 and 2 are largely confined to a discussion of functions of several 
variables with no definite form assigned. Averages or expected values are 
largely used. In the study of individuals grouping themselves into classes 
the author considers the case of one variable F and defines individuals as 
belonging to the same class when (F’ —F)? —A? <0, where the F’s are the 
values of the characteristic determining the class structure for the two in- 
dividua!s and A is some constant. For large groups of individuals, the extent’ 
of the upper class is found by averaging the left-hand side of the above in- 
equality over a portion of the joint distribution of two individuals inde- 
pendently drawn from the distribution of the characteristic. If we know 
N (F), the distribution of the characteristic, and the number of classes in 
the society, we can in principle calculate A. 

In Chapter 3 Rashevsky gives an approximate treatment of the interaction 
of social classes in which he uses an all-or-none principle, that is, either 
individuals are “active” or “passive.” The active individuals belong to two 
groups each with a single activity in which they try to persuade the passive 
members to join them. The problem seems to be to see what kinds of condi- 
tions wi!! lead to all the passive individuals occupying themselves with one 
activity or the other. Rashevsky feels that his results agree in general with 
the rapid spread of mass hysterias and revolts, and the reviewer feels that 
the ideas may approximate the results observed in fads, fashions, rumours, 
or propaganda. It is interesting to note that the differential equations pro- 
duced are of the same form as Richardson’s armament equations. This is 
not surprising because Rashevsky is dealing with warfare between two 
groups for possession of a third. Rashevsky’s equations simplify more than 
Richardson’s because of an additional restriction. Richardson is not men- 
tioned. 

Chapter 4 is an extremely useful chapter from the point of vie of the 
social scientist not well acquainted with mathematical models and the ade- 
quacy of various kinds of approximations met with in mathematical physics. 
In this chapter entitled “A More Exact Treatment of the Previous Case,” 
Rashevsky shows that allowing individuals to distribute themselves on an 
active-passive scale instead of forcing each individual to assume one or the 
other end of the continuum can lead to essentially the same results as those 
given by the more approximate methods of Chapter 3. This procedure can 
teach the social scientist that a process of simplification in mathematical 
models does not necessarily lead to the loss of essentials. In other words, one 
should not use too glibly the pat phrase: “Of course, this treatment is much 
too over-simplified to be of any real use.” 

Chapters 5 and 6 deal with economic problems raised by the existence 
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of persons in a society who are so good at organizing that the whole group 
can gain if the workers will work under the direction of the organizers, and 
the organizers are willing to organize the workers. 

Chapter 7 might interest the practical man, although the illustrations may 
seem to him to be tours de force. It suggests how previously developed theory 
can be applied to estimating the ratio of the population of the capital of a 
country to the urban population less that of the capital, using the proportion 
of national income taken in taxes. The population of the capital is taken as 
an index of the size of the governing class and the urban population as an 
index of the total number of active individuals in a society composed of three 
classes: the governing, the organizing, and the passive. The results for Ger- 
many, France, and the United States look very good, but for England the 
capital is too large. The reviewer does not think population of a capital a 
good enough index of size of governing class to make the example convincing. 
Another example is the prediction of the incidence of crime from taxes and 
population density, while still a third example deals with the divorce rate. 
The fit of the calculated quantities to the observed data is rather encouraging. 
It should be mentioned that in Chapter 6 by a sequence of crude approxima- 
tions a formula is obtained for estimating the per capita income of a country 
in terms of its urban population percentage and its population density. The 
results do not seem to fit very well in this case. In all these cases Rashevsky 
feels that the real interest attaches to the fact that certain relations are sug- 
gested even by an inadequate theory which then helps us notice such rela- 
tions when they occur. 

Chapter 9 is concerned with two notions of individual freedom: the first 
concerns economic freedom and is rather suggestive, the second deals with 
freedom of an individual to choose among many activities and seems to the 
reviewer to fall flat on its face. 

Chapter 10 deals with the distribution of the per cent urban population in 
a growing society and considers two or three possible assumptions. Data are 
given showing population of a country against per cent urban. The curve de- 
rived for the United States fits the data pretty well, although a straight line 
would fit them better, but that for Germany is extremely convincing. The 
data for Russia (7 points) are fitted by a two-branch curve with the aid of 
arguments about the reform history of the country. The data for Sweden 
are the most interesting available, but are dismissed with the statement 
that the theory is inadequate to explain them although the rapid reader 
might think that the excellently fitting curve shown is a derived one. The 
reviewer cannot tell whether this curve is derived or not but suspects that it 
is a free-hand fit. 

Chapters 8 and 9 deal with city sizes and do not seem to reach a very suc- 
cessful conclusion. 

Chapters 14 and 15 deal with social classes, social mobility, production, 
and the effects of restrictions. Chapter 17 concerns some consequences of 
previous theory and the theory is extended to estimate the percentage of 
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per capita income spent for military purposes in various countries and also 
the number of inventions by various countries. The theory also suggests 
that the “influence” of a country is proportional to N?/S, where N is the 
population and S the area. Graphs for various countries from 1600 to the 
present do not violate a reader’s intuition about which countries had great 
influence during this period. 

Chapters 19 and 20 are concerned with individualistic as opposed to col- 
lectivistic behavior and here Rashevsky draws heavily on G. E. Evans’ 
Mathematical Introduction to Economics. The principal result is that indi- 
viduals may profit more individually by trying to maximize the group satis- 
faction rather than their individual satisfaction. 

Chapter 21, “Some Considerations of the History of a Few Nations,” dis- 
cusses largely in hand-waving terms what happens under various degrees of 
interclass mobility. In other words, the happenings in several different coun- 
tries, Russia, China, England, and the United States are talked about in 
terms of some of the theory, although not really derived from the theory. 
As in many history books, the discussion of the United States concludes 
“To what extent this shift toward governmental control will continue cannot 
be predicted on the basis of the present theory” (p. 180). 

Chapters 22 and 23 have to do with physical conflict between groups or 
nations. The theory developed is one of the variations of Lanchester’s Law, 
although Lanchester is not mentioned. 

A few misprints, mostly minor, were noted. The more important are: 
page 17, equation (9), C should be (; page 18, line 5, > should be >; page 
78, equation (6), delete second equal sign; page 84, equation (16), bar of 
radical should not cover second term. 

The most important thing is that a book has appeared which tries to 
treat a variety of social problems by means of mathematical models. That 
the attempts have met with varying degrees of success is not too important. 
The results given are certainly successful enough to encourage others to 
make further attempts. Indeed, some of the basic material presented here is 
worth extending along the lines indicated by the author and worth supple- 
menting with practical numerical examples drawn from data. 
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